ODBMS Industry Watch

Jul 27 26

A New Era for MySQL: Heather VanCura and Jason Wilcox on Open Source, Community Governance, and Where MySQL Is Headed

by Roberto V. Zicari

“Through transparent roadmaps, community-driven collaboration, contributor programs, and the MySQL Governance model, we aim to create an environment where innovation can accelerate while preserving the reliability, compatibility, security, and operational excellence that organizations around the world depend on.”

Q1. Oracle has announced a “new era” of MySQL community engagement at MySQL’s 30th anniversary. Can you walk us through what specifically prompted this strategic shift, and what concrete changes can the community expect to see in how Oracle approaches MySQL development and governance?

HVC: Throughout 2025 we celebrated 30 years of MySQL and reflected on the past and present, but more importantly, the future. The MySQL Community team sought feedback from around the globe on how to lead the next generation of MySQL innovation and open source collaboration. We came to Jason in November and shared that feedback and proposed a plan to rebuild community trust. By December we agreed on a plan, calling it a new era of Community Engagement.

We have entered a deeper collaboration with the MySQL Community, focused on faster innovation, greater transparency, deeper community collaboration, and expanding the ecosystem. Starting with the April 2026 release, we’re delivering more features directly into the MySQL Community Edition core while preserving the stability customers rely on.

As part of this effort, we introduced the MySQL Governance model, which provides clear pathways for participation, community leadership, and long-term collaboration with the broader MySQL ecosystem. Together, these initiatives are designed to build deeper trust, accelerate innovation, and grow the MySQL ecosystem.

Q2. One of the most significant announcements is moving previously commercial-only features into the MySQL Community Edition. What drove this decision, and what other enterprise features are you planning to bring to the community edition in the coming months?

JW: Both our Community and our Customers are asking for stability and faster innovation. At the same time, they want more visibility into our roadmap and a stronger voice in shaping it. Driving some previously Enterprise-only features into the Community Edition addresses both needs, while we not only deliver those features, but we also build, prioritize, and deliver new features and innovations into MySQL.

With the GA of MySQL 9.7.0 LTS, MySQL moves from the 9.x innovation series to a new Long-Term Support release line. This begins the 9.7.x LTS series, giving users a stable branch to standardize on while continuing to build on the innovation delivered through the 9.x cycle.

This release matters not only because it establishes the next LTS baseline, but because it reflects a broader direction for MySQL. Over the last several releases, we have talked about giving users earlier visibility into what is coming, broadening access to important capabilities, and working more openly with the MySQL community. With MySQL 9.7.0 LTS, that direction is reflected in the product itself.

Several capabilities previously limited to MySQL Enterprise Edition are now available in MySQL Community Edition, while Dynamic Data Masking is now available in MySQL Enterprise Edition. Together, these changes make MySQL 9.7.0 LTS a meaningful release for DBAs, developers, and operators across both editions.

More capability in MySQL Community Edition

One of the biggest themes in MySQL 9.7.0 LTS is the continued expansion of MySQL Community Edition. Across 4 major technical areas, this release delivers 8 notable new Community Edition capabilities — a substantial broadening of what DBAs and developers can do with Community Edition.

The 4 major areas

Replication observability and HA behavior
- Flow-control monitoring
- Multi-threaded applier extended statistics
- Automatic Eviction & Rejoin
- Up-to-date Aware Primary Election
Telemetry and observability integration
- Telemetry / OpenTelemetry support
Modern application development
- MySQL JSON Duality Views
Query optimization and performance
- Hypergraph Optimizer
- Profile-Guided Optimization (PGO)

Q3. Some community members have expressed concerns about MySQL’s development velocity and commit rates. Jason, as SVP of Data Services, what specific steps are you taking to address these concerns, and how do you plan to balance cloud service development with core MySQL innovation?

JW: Oracle has invested heavily in MySQL since 2010, and we hear feedback from the community. People want to see that investment show up in a more visible way, especially through faster delivery in the open. We’re working on that in a few concrete ways: getting more features into MySQL Community Edition, sharing more of the roadmap and worklogs, using Early Access releases to get feedback earlier, and creating more public forums where contributors can talk directly with the MySQL engineering team.

We’re also putting more structure around how people can participate, through the MySQL Governance model, contributor summits, design discussions, and clearer contribution paths. The goal is straightforward: be more open about where MySQL is going and give the community more practical ways to influence priorities, test features earlier, report issues, and contribute improvements. Cloud and core MySQL are not separate priorities for us — the core database is the foundation for Community, Enterprise, and HeatWave, so continued innovation in MySQL itself remains central to everything we’re doing.

In addition to accelerating innovation, we are creating more opportunities for community participation through public roadmaps, Early Access releases, public discussions, contributor summits, and the MySQL Governance model. Together, these initiatives provide greater transparency into our priorities while creating structured mechanisms for contributors to participate, provide feedback, and help influence the future direction of MySQL.

Q4. Oracle has published the MySQL Community roadmap and promised to facilitate community contributions through worklogs and bug reports. How will this differ from past practices, and what mechanisms are you putting in place to ensure transparent, bidirectional communication between Oracle’s engineering team and external contributors?

HV: With our Community Engagement Plans, MySQL customers and users get the best of both worlds: enterprise-grade stability and faster access to innovation. They’ll also have greater visibility into what’s coming and more opportunities to provide input, which helps them align MySQL with their own technology roadmaps. In addition to publishing select worklogs and CVE information, we have continued Labs for new features and early access releases leading up to the 9.7 launch, which will continue in future releases, with our next Early Access planned for early July. These provide valuable insight and transparency to community members and invaluable feedback to the engineering team.

We have organized a series of public discussions (four so far), with a fifth planned for July, as well as established a quarterly Contributor Summit and regular design meetings under the MySQL Governance model. The first Contributor Summit took place in May 2026, with a design meeting held the week prior. The next Contributor Summit is scheduled for August 2026 in Broomfield, Colorado.

The governance model provides structured pathways for participation through code contributions, testing, documentation, reviews, technical discussions, and community leadership. It introduces clearly defined roles—including Contributors, Committers, Project Leads, Core Project Leads, a Steering Committee, and a Vulnerability Group—to help ensure transparent collaboration while maintaining MySQL’s standards for quality, stability, compatibility, and security.

In the last quarter, we also published the MySQL Developer Guide, which describes how to effectively contribute and participate in the evolution of MySQL.

To catch up on previous discussions, see highlights from earlier sessions:

Edition #4 highlights (contributions and feature requests)
Edition #3 highlights (bugs and contributions)
Edition #2 highlights (ecosystem and metrics)
Edition #1 highlights (community roadmap)

Q5. How do you see the MySQL governance structure evolving to give the community a stronger voice while maintaining Oracle’s stewardship?

HV: The MySQL Governance model is a key part of how we are evolving community participation while maintaining Oracle’s long-term stewardship of the project. The model is built on principles of transparent processes, merit-based participation, shared stewardship, and a commitment to quality, stability, compatibility, and security.

Oracle remains the primary steward of MySQL while creating clearer pathways for the community to participate in shaping the project’s future. Community members can contribute through code, testing, documentation, bug reports, design discussions, and reviews. As contributors gain experience and demonstrate sustained engagement, they can take on greater responsibilities through defined governance roles.

The model also introduces a Steering Committee that brings together perspectives from Oracle, users, customers, hyperscalers, and the broader open source ecosystem to help guide long-term priorities, governance evolution, ecosystem growth, and community engagement.

Together with public roadmaps, Early Access releases, GitHub collaboration, contributor summits, and design meetings, the governance model creates a structured framework for community participation while preserving the engineering excellence and operational stability that organizations around the world depend on.

Q6. PostgreSQL has been gaining ground with features like pgvector for AI workloads, while MySQL faced criticism for lack of similar capabilities. How does Oracle plan to ensure MySQL remains competitive not just with PostgreSQL, but also with cloud-native databases and newer entrants in the database market?

JW: MySQL offers a uniquely predictable and stable operational model at global scale, combined with strong performance and ease of use. Backed by Oracle, it delivers enterprise-grade reliability while maintaining the flexibility and innovation of open source. We will continue to collaborate with the community to deliver innovations based on our published roadmap into MySQL Community Edition.

Q7. The move of MySQL into Oracle’s cloud organization raised concerns about resource allocation. Can you address these concerns and explain how Oracle is ensuring MySQL has the engineering resources it needs to execute on this new community-focused vision?

JW: MySQL’s success has always come from the combination of strong stewardship and a vibrant community. Oracle continues to invest deeply in both. What’s new is increased transparency, stronger engagement with the community, and more structured ways for contributors, partners, customers, and ecosystem participants to help shape MySQL’s future through the MySQL Governance model and related community programs.

These investments complement our continued engineering investment in MySQL Community Edition, MySQL Enterprise Edition, and MySQL HeatWave.

Q8. You’ve mentioned expanding collaboration with Linux distributions, particularly Canonical and Ubuntu, as well as supporting major open source projects like WordPress and Drupal. What does this ecosystem support look like in practice, and how will Oracle work with companies that some might consider competitors in the MySQL space?

HV: We have built relationships and communication between the MySQL Community Team and open source maintainers to ensure the pathways are smooth for projects to build their projects and platforms using MySQL. We continue to strengthen communications and remove barriers to collaboration.

That spirit of collaboration is reflected in the MySQL Governance model and community engagement efforts. The recent Contributor Summit brought together Oracle engineers and contributors from organizations including Amazon, Google, Percona, ProxySQL, Readyset, VillageSQL, and participants from across the broader MySQL ecosystem, including MariaDB, to share ideas and help shape the future of MySQL.

We continue to focus on growing and expanding the MySQL ecosystem, referencing the analogy of a rising tide lifting all boats. Growing the community and bringing more collaboration and alignment makes us all stronger together and creates opportunities throughout the ecosystem.

Q9. For organizations currently running MySQL in production, what’s your message about long-term support and the roadmap? With MySQL 8.0 approaching end of life and MySQL 9.7 LTS published in April 2026, how should enterprises plan their migration strategies and what assurances can you provide about stability and backward compatibility?

The release introduces a new long-term support version of MySQL Community Edition, and MySQL Enterprise Edition, along with expanded feature delivery into the core, early access capabilities, and the first phase of our enhanced transparency and community engagement model.

Q10. Looking beyond the immediate announcements, what is Oracle’s five-year vision for MySQL? How do you see MySQL evolving to meet the demands of AI workloads, cloud-native architectures, and modern developer expectations while preserving the simplicity and reliability that made it the world’s most popular open source database?

JW: MySQL powers everything from startups to hyperscale platforms. It’s used by companies like Uber and Booking, and underpins major platforms like WordPress and Ubuntu. That breadth of adoption is a strong validation of its reliability and scalability.

The vision is simple: build MySQL in the open with the community, accelerate innovation without sacrificing quality or stability, and continue to scale and grow the ecosystem around the world’s most widely used open source database platform.

A key part of that vision is establishing a sustainable governance framework that enables broader participation, develops future community leaders, and creates stronger connections between Oracle, contributors, customers, partners, hyperscalers, and the broader open source ecosystem.

Through transparent roadmaps, community-driven collaboration, contributor programs, and the MySQL Governance model, we aim to create an environment where innovation can accelerate while preserving the reliability, compatibility, security, and operational excellence that organizations around the world depend on.

Jason Wilcox
Senior Vice President, Data and AI Platform, Oracle Cloud Infrastructure (OCI)
Jason Wilcox leads the Data and AI Platform organization at Oracle Cloud Infrastructure (OCI), overseeing the design and development of OCI’s data platforms, AI infrastructure and platform services, and open source technologies. His portfolio spans cloud-scale data services, data processing and integration platforms, operational services for AI workloads, and widely adopted open source technologies that developers and enterprises rely on to build modern applications. These services help customers manage and use data, run AI workloads, and operate secure, reliable, and scalable systems on OCI.

Heather Vancura
Vice President, External Standards & Community Engagement, Oracle Cloud Infrastructure (OCI) Heather VanCura is Vice President of External Standards & Community Engagement at Oracle, where she leads Java Community programs and the MySQL Community Outreach team. With over 20 years of experience at Oracle and Sun Microsystems, she is a central figure in the global ecosystem, focusing on community growth, engagement, and standardization efforts.

………………….

Follow us on X

Follow us on LinkedIn

Jun 30 26

What I Didn’t Learn in Medical School: Mathias Goyen on AI, Judgment, and the Human Side of Healing

by Roberto V. Zicari

“When patients say that AI listens better than their doctor, they are rarely making a statement about empathy. They are making a statement about time.”

Q1. Your book (*) argues that medical schools teach the technical anatomy of disease but not the anatomy of human hopes and fears, that physicians learn to diagnose but not always to truly listen. As AI systems increasingly match or exceed physicians on the technical and encyclopedic dimensions of medicine, you suggest the physician’s role as a trusted human ally becomes more important, not less. But that humanistic competency is, by your own account, something doctors are largely left to acquire on their own, often through harsh and humbling experience.

What would it actually take to teach this deliberately, and do you believe medical education as an institution is capable of changing fast enough to do so before an entire generation of physicians has already been shaped by the system as it exists today?

Mathias Goyen: The arrival of AI has created a fascinating paradox. The more capable our technology becomes at processing information, the more valuable distinctly human capabilities become. For centuries, medicine has largely defined excellence through knowledge, diagnostic accuracy, and technical skill. Those qualities will always remain essential, yet they are no longer sufficient on their own because information has become increasingly accessible while judgment, trust, and the ability to accompany another human being through uncertainty have become the true scarce resources.

This is precisely where I believe medical education faces its greatest challenge. We still devote enormous energy to teaching students how diseases behave, yet comparatively little attention is given to how people behave when they become patients. We teach physiology, pathology, pharmacology, and anatomy with remarkable rigor, but far less time is spent understanding fear, uncertainty, hope, grief, or the psychological complexity that accompanies almost every serious diagnosis. Those aspects are often treated as something physicians will simply acquire through experience, as though compassion naturally emerges after enough years on the wards. Experience certainly matters, but experience alone is an unreliable teacher. Some physicians become wiser through it, while others simply become more efficient.

The encouraging news is that I do not believe these qualities are beyond teaching. What we cannot teach through lectures alone can be cultivated through deliberate exposure to complexity. Medical students should spend more time observing difficult conversations than memorizing another list of rare syndromes. They should regularly reflect on situations in which there was no perfect answer. They should discuss uncertainty with senior clinicians who are willing to admit that medicine is often practiced without complete certainty and that wisdom frequently consists of choosing responsibly among imperfect alternatives rather than identifying a single correct solution. In other professions, including aviation, the military, and executive leadership, reflection after difficult situations is considered an essential part of professional development. Medicine still tends to reward certainty even when uncertainty is the daily reality.

AI makes this transformation more urgent, not because it diminishes physicians, but because it changes where physicians create value. If machines increasingly become exceptional at organizing knowledge, physicians must become exceptional at helping people navigate that knowledge. Patients will continue to need someone who can interpret information within the context of an individual life, balance competing priorities, communicate honestly when certainty is impossible, and remain present when medicine reaches its limits. None of these responsibilities become less important because an algorithm is available. If anything, they become more central to the profession than ever before.

Can medical education change quickly enough? I believe it can, but only if we accept that the future physician requires a broader definition of competence than the one that has guided us for generations. Medical schools have repeatedly demonstrated their ability to adapt when science demanded it. We incorporated molecular biology, genomics, and digital medicine into our curricula because they became indispensable. We should now show the same determination in teaching judgment, communication, ethical reasoning, adaptability, and the ability to lead patients through uncertainty. These are not soft skills that merely complement medical expertise. In the age of AI, they increasingly define it.

Q2. Patients increasingly arrive at appointments having already consulted ChatGPT, Gemini, or Claude, sometimes reporting that “AI listens as no doctor did before.” That is a remarkable and uncomfortable statement about the current state of clinical encounters.

What does it actually mean, in practice, for a physician to compete with, or more usefully, to integrate, a technology that can offer patients more time, more patience, and the appearance of being heard, within a healthcare system that structurally cannot offer physicians the time to do the same?

Mathias Goyen: I actually do not believe physicians should think of themselves as competing with AI. The moment we begin framing the relationship as a competition, we have already misunderstood what patients are really telling us.

When patients say that AI listens better than their doctor, they are rarely making a statement about empathy. They are making a statement about time. AI never interrupts. It never looks at the clock. It never appears distracted by the next patient waiting outside the door. It allows people to finish their thoughts before responding. That experience alone can create a powerful feeling of being heard, even though the technology itself experiences neither compassion nor understanding.

That observation should not make physicians defensive. It should make us reflective. It forces us to ask a difficult question about our healthcare systems rather than about our technology. Have we gradually created an environment in which efficiency has become so dominant that patients increasingly value uninterrupted attention as much as medical expertise? I suspect the answer is yes.

Ironically, I see this development as an opportunity rather than a threat. If patients arrive having already explored their symptoms with AI, the consultation no longer needs to begin with the simple transfer of information. Instead, it can begin at a much more meaningful level. The physician can help patients interpret what they have learned, distinguish probable explanations from unlikely ones, place isolated facts into the context of an individual life, and openly discuss uncertainty where uncertainty genuinely exists. In other words, the conversation can move away from information retrieval and toward judgment.

This also changes the physician’s role in a subtle but important way. Historically, physicians often served as the primary source of medical knowledge. Increasingly, they become trusted interpreters of knowledge that is already available to everyone. I consider that an evolution rather than a loss. Trust has never depended simply on possessing information that others do not have. Trust emerges when someone helps us understand what information actually means for our own lives.

The real danger, therefore, is not that AI becomes too patient. The danger is that healthcare systems conclude that because AI can provide unlimited conversational capacity, human conversation becomes less necessary. That would fundamentally misunderstand why patients seek physicians in the first place. Patients do not simply come to receive answers. They come to share responsibility for decisions that may profoundly affect their lives. They want someone who can recognize when uncertainty remains, explain why different options carry different consequences, and occasionally say, “I do not know, but we will work through this together.” Those moments create trust in a way that no technology can replicate.

Ultimately, I hope AI will not replace the conversation between physician and patient but elevate it. If AI can assume much of the informational and administrative workload, physicians should have greater freedom to focus on the conversations that require wisdom rather than recall, presence rather than speed, and judgment rather than computation. Whether that vision becomes reality, however, depends far less on the technology itself than on the choices healthcare organizations make about how the time that AI creates is ultimately used.

Q3. You write that the physician today carries not only a stethoscope but also data, and that patients now often have access to the same data and the same AI tools that physicians have. This symmetry is historically unprecedented in medicine. For healthcare leaders and policymakers far beyond any single organization or technology vendor, what do you believe is the single most important structural or educational change needed to ensure that this newly symmetrical relationship between doctor and patient becomes a source of trust and collaboration, rather than confusion, distrust, or a further erosion of the time available for genuine human connection?

Mathias Goyen: If I had to identify a single priority, it would not be the introduction of more technology. It would be the deliberate cultivation of judgment as a shared competency between physicians and patients.

For most of medical history, knowledge itself created an asymmetry. Physicians possessed information that patients simply could not access. Today that asymmetry is rapidly disappearing. A patient can read scientific publications, access clinical guidelines, ask sophisticated questions of large language models, and arrive in the consultation having accumulated an extraordinary amount of information. That development should not be feared. An informed patient is not a threat to medicine. Quite the opposite. Mutual understanding has always been the foundation of shared decision making. The challenge is that access to information is not the same as the ability to interpret it. Modern medicine generates probabilities rather than certainties. Imaging findings require clinical context. Laboratory values depend on medical history. Risk predictions require value judgments about what matters most to an individual patient. AI can organize remarkable amounts of information, yet deciding what deserves attention, what uncertainty remains acceptable, and which option best reflects a person’s preferences continues to require human judgment.

This is why I believe healthcare systems should move beyond thinking primarily about digital literacy and begin focusing much more deliberately on decision literacy. Physicians need to become better at explaining uncertainty without undermining confidence. Patients need to become more comfortable understanding that medicine rarely offers absolute answers and that reasonable experts can occasionally reach different conclusions while acting in good faith. Trust grows when uncertainty is acknowledged honestly rather than hidden behind artificial certainty.

There is also an important leadership responsibility. Healthcare organizations should resist the temptation to measure success exclusively through productivity metrics, waiting times, or numbers of consultations completed. Those indicators matter, but they tell us remarkably little about whether patients actually understood the decisions that were made together. If AI allows us to create more informed patients but simultaneously leaves physicians with even less time to interpret information collaboratively, we will have solved the wrong problem.

Ultimately, I believe the relationship between physicians and patients is becoming less hierarchical and more collaborative. That is one of the most profound cultural shifts medicine has experienced in generations. The physician’s authority will increasingly arise less from exclusive access to knowledge than from the ability to guide thoughtful decisions in situations where knowledge alone is insufficient. Patients, in turn, become active participants rather than passive recipients of care. I consider that an extraordinarily positive development because trust built through partnership is ultimately stronger than trust built through dependency.

If we succeed in creating healthcare systems that value dialogue as much as diagnosis, AI may become one of the greatest opportunities medicine has seen in decades. If we fail, we may discover that we improved the flow of information while unintentionally weakening the relationships that give information its meaning.

Q4. You made the unusual transition from practicing physician and academic radiologist to global Chief Medical Officer inside a major commercial MedTech company. That move places you in a position that carries an inherent tension: the Hippocratic commitment to the patient’s interest above all else, and the commercial reality of an organization whose technologies must ultimately sell and generate returns for shareholders.

How do you personally navigate that tension in your day-to-day decisions, and what would you say to a young physician who is skeptical that genuine humanistic medicine and a senior leadership role inside a commercial healthcare company can coexist with integrity?

Mathias Goyen: This question assumes a tension that certainly exists, yet perhaps not in quite the way many people imagine. Throughout my years in industry, I rarely experienced the debate as one between doing what is right for patients and doing what is right for the business. More often, I experienced it as a question of time horizon.

Healthcare is unusual because trust accumulates slowly and can disappear remarkably quickly. Physicians recommend technologies because they believe they improve patient care. Hospitals invest because they expect meaningful clinical value over many years. Companies build reputations over decades rather than quarters. When viewed from that perspective, serving patients well and building a successful business are not competing objectives. They are deeply interconnected. Commercial success becomes sustainable only when clinicians genuinely believe that a company helps them care for patients more effectively.

As a physician working inside industry, I always considered my role somewhat different from many other leadership positions. I was not there to replace commercial thinking. I was there to complement it with clinical perspective. Every discussion about product development, workflow, AI, education, or implementation eventually led me back to the same questions. Does this solve a meaningful problem? Will it make a physician’s work easier, more thoughtful, or safer? Will it ultimately improve the patient’s experience or outcome? Those questions do not eliminate difficult business decisions, but they provide a remarkably consistent compass.

I also learned something that surprised me. Before joining industry, I imagined companies primarily as organizations that develop technologies. Over time I came to realize that their greatest influence often lies elsewhere. They shape education. They convene experts from around the world. They invest in research. They help translate scientific discoveries into everyday clinical practice. They create ecosystems that individual hospitals or universities could rarely establish on their own. When those responsibilities are approached thoughtfully, industry becomes an important partner in advancing healthcare rather than merely supplying it.

To a young physician who is skeptical about entering industry, I would simply say that integrity does not depend on the logo on your business card. It depends on whether you remain intellectually honest about whom your decisions ultimately serve. Good people can make poor decisions in universities, hospitals, governments, or companies. Equally, principled leadership can exist in all of those environments. The ethical responsibility travels with the individual rather than the institution.

Perhaps my years in industry strengthened rather than weakened my belief in humanistic medicine. They allowed me to appreciate healthcare from perspectives that are rarely visible inside a single hospital. I saw how engineers, software developers, regulatory experts, clinical scientists, economists, policymakers, and physicians all contribute different forms of expertise to the same objective. Modern healthcare has become far too complex for any profession to improve it alone.

That realization also changed my understanding of leadership. Leadership is less about representing one profession than about creating conditions in which very different professions can solve meaningful problems together. In many ways, that lesson reflects the central message of my book. Medicine has always been a profoundly human endeavor, yet increasingly it is also a collaborative one, and our responsibility is to ensure that scientific innovation, commercial innovation, and human values continue to move in the same direction.

Q5. Looking back at your own career, from clinical practice and academic medicine to international hospital leadership to your current global role, what is the moment or experience that most changed how you understand what patients actually need from a physician, the kind of moment that you suspect could never be fully captured in a textbook, a curriculum, or, for that matter, in an AI system? And on a personal note: is there a particular patient encounter from your years of clinical practice that you still think about today, and that shaped the convictions behind this book?

Mathias Goyen: I find it surprisingly difficult to identify a single defining moment. That is perhaps because medicine rarely changes us through dramatic events alone. More often, it changes us gradually, almost imperceptibly, through hundreds of encounters that quietly reshape how we think about illness, responsibility, and the privilege of caring for another human being.

When I was younger, I believed that patients primarily came to physicians seeking answers. Over time, I realized that many came for something more subtle. They were looking for orientation at moments when life had suddenly become uncertain. A diagnosis changes far more than a person’s health. It interrupts a biography. It changes how people think about their future, their family, their work, and sometimes even their identity. Physicians naturally focus on understanding the pathology, while patients are often trying to understand what remains possible in their lives. Those are very different questions.

That realization gradually changed my own consultations. I became less concerned with demonstrating how much I knew and more interested in understanding what the patient was actually asking. Quite often, the question spoken aloud was not the most important one. Behind a technical question about an MRI finding or a treatment option there was frequently another question that remained unspoken. Will I still be able to care for my children? Will I become dependent on others? Am I going to die? Learning to recognize those questions probably changed my practice more than any scientific publication I ever read.

For that reason, I cannot point to a single patient who shaped this book. Instead, the book represents a conversation with many patients whose names I no longer remember but whose concerns I still do. They taught me that medicine is practiced simultaneously on two levels. One level concerns disease, where science rightly guides our decisions. The other concerns the human experience of illness, where listening often matters as much as explaining. Medical school prepared me exceptionally well for the first. The second was learned almost entirely through experience.

This is also why I believe some aspects of medicine will always resist complete automation. AI may become remarkably effective at recognizing patterns, generating differential diagnoses, or summarizing scientific evidence. Those capabilities will undoubtedly improve healthcare. Yet every patient enters the consultation carrying a unique life story that gives medical facts their meaning. Two patients with identical diagnoses may require entirely different conversations because their fears, priorities, relationships, and hopes are profoundly different. Understanding that difference requires something more than information processing. It requires curiosity about another human being.

If I were to distill one lesson from my years in medicine, it would be this. Patients rarely remember every detail of what we explained. They often remember whether they felt safe while facing uncertainty. Today, I believe that helping patients feel safe while facing uncertainty may have been one of the most important responsibilities I ever had as a physician, even though it was never formally described in any curriculum I followed.

Q6. You describe healthcare systems as “drowning in bureaucracy” and structured around appointment slots and documentation requirements that consume the time meant for genuine conversation. AI is often proposed as the solution to exactly this problem through ambient documentation, automated coding, and administrative automation. But there is a real risk that the time saved by AI is simply absorbed by the system rather than returned to the patient physician relationship.

What evidence, if any, have you seen that AI driven efficiency gains in healthcare actually translate into more human time at the bedside rather than simply more throughput, and what would need to change structurally to ensure the former rather than the latter?

Mathias Goyen: This is one of the most important questions surrounding AI in healthcare because it reminds us that technology alone cannot determine how its benefits are ultimately used. Technology creates possibilities. Organizations decide whether those possibilities become reality.

There is growing evidence that ambient documentation, intelligent summarization, and administrative automation can reduce the time physicians spend interacting with computers. That is encouraging and represents meaningful progress. At the same time, I believe we should be careful not to make a logical leap that the current evidence does not yet fully support. Saving documentation time does not automatically mean that physicians spend more time with patients. In many healthcare systems, the newly available capacity is immediately redirected toward seeing more patients, completing more documentation, or fulfilling additional administrative requirements. Efficiency is created, but humanity does not necessarily increase.

This distinction is crucial because the true value of AI should not be measured simply by minutes saved. It should be measured by what those minutes become. If every efficiency gain is automatically converted into higher throughput, we may discover that physicians become more productive while patients do not feel more cared for. That would be a remarkable paradox. We would have built technology capable of giving time back to medicine without actually giving time back to the people who practice it.

I therefore believe that the successful implementation of AI is ultimately a leadership challenge rather than a technology challenge. Leaders need to decide explicitly what they want AI to achieve. Is its primary purpose to maximize productivity? To improve quality? To reduce burnout? To strengthen the physician patient relationship? These objectives overlap, yet they are not identical, and organizations that never articulate their priorities often discover that efficiency silently becomes the default objective.

This also requires us to rethink how we evaluate success. Healthcare has become exceptionally good at measuring activity. We know how many patients were seen, how many procedures were performed, and how long documentation required. We are much less sophisticated at measuring whether physicians had enough time to explain uncertainty, whether patients genuinely understood their options, or whether trust improved during the consultation. Those dimensions are more difficult to quantify, yet they may ultimately matter far more.

Perhaps the most profound opportunity offered by AI is not that it enables physicians to work faster. It is that it offers us a rare opportunity to decide what medicine should do with the time it recovers. That is not a technical question. It is an ethical and organizational one.

If we consciously return even part of that time to thoughtful conversation, shared decision making, and the human aspects of care that have gradually been crowded out by bureaucracy, AI may become one of the greatest restorations of humanism that modern medicine has experienced. If we simply use it to accelerate an already overloaded system, we should not be surprised if physicians continue to feel exhausted despite having better technology.

Q7. You write about situations where physicians cannot save everyone, where they are powerless and have no answer to a patient’s question, and about the power of silence as something that can be as comforting as words. These are precisely the situations where AI, by its nature, cannot help, it has no silence to offer, no genuine powerlessness to share with another human being. As AI takes over more of the technical and diagnostic burden, do you believe physicians will have more capacity to be present in these irreducibly human moments, or do you worry that a healthcare system optimized around AI efficiency will paradoxically squeeze out exactly the kind of presence that cannot be automated?

Mathias Goyen: I think the answer depends far less on AI than on ourselves. Technology does not decide what medicine values. We do.

There is an understandable tendency to imagine that if AI assumes more technical work, physicians will naturally have more time for the deeply human moments that accompany serious illness. I sincerely hope that proves true, yet I do not believe it is inevitable. Healthcare has a long history of converting efficiency gains into additional activity rather than additional presence. Unless we consciously choose otherwise, the same pattern could easily repeat itself.

That would be deeply unfortunate because the moments you describe are, in many ways, the essence of medicine. There are conversations in which physicians have no treatment left to offer, no reassuring certainty, and no words capable of removing another person’s suffering. Yet those encounters are not failures. Sometimes the most meaningful thing a physician contributes is simply the willingness to remain present when uncertainty, fear, or grief cannot be taken away.

One of the lessons I learned during my clinical years is that patients rarely expect physicians to solve every problem. They understand, often better than we imagine, that medicine has limits. What they hope for is that those limits are not faced alone. Presence is therefore not the absence of action. Presence is itself a form of care.

This is one reason why I hesitate whenever discussions about AI become dominated by questions of replacement. The real opportunity is not to replace physicians in the human dimensions of medicine. It is to relieve physicians of tasks that never required their uniquely human capabilities in the first place. Every administrative burden that disappears creates the possibility of another conversation. Every repetitive task that becomes automated creates the possibility of another moment of attention. Whether those possibilities become reality depends entirely on the values embedded within the healthcare system.

There is another aspect that deserves attention. Physicians themselves often struggle with silence because medical education teaches us to respond, explain, and intervene. Yet some of the most memorable consultations occur precisely when no explanation is sufficient. Sitting quietly with a patient after delivering difficult news can communicate honesty, solidarity, and respect in ways that language sometimes cannot. Those moments may appear unproductive from the perspective of operational efficiency, yet they are profoundly productive from the perspective of healing, even when cure is no longer possible.

Perhaps the ultimate purpose of AI is not to make physicians less necessary, but to allow them to become more fully what only they can be. If AI helps restore the time and emotional space for conversations that have gradually been displaced by documentation, administration, and fragmented workflows, it will have achieved something far greater than efficiency. It will have helped medicine recover an essential part of its own identity.

Whether that future emerges is not a technological question. It is a cultural choice.

Q8. Burnout among physicians is one of the most consistent findings in healthcare workforce research worldwide, and you connect it directly to the gap between what medical school promises and what the healthcare system actually delivers. As AI becomes more capable and more embedded in clinical workflows, there is a genuine debate about whether it will reduce physician burnout by removing administrative burden, or increase it by adding new layers of complexity, oversight responsibility, and the cognitive load of constantly evaluating AI generated recommendations. Based on what you are seeing across health systems globally, which direction do you believe is more likely to dominate over the next five years, and what would tip the balance one way or the other?

Mathias Goyen: I believe AI has the potential to reduce physician burnout, but only if we first become more precise about what we actually mean by burnout.

Physicians are certainly exhausted by documentation, fragmented workflows, repetitive administrative tasks, and the growing complexity of modern healthcare. AI can help address many of those burdens, and I am optimistic that it will. Intelligent documentation, decision support, and automation of routine activities can remove work that adds little professional satisfaction while consuming enormous amounts of time and attention.

Yet there is another dimension of burnout that receives less attention. Many physicians are not simply tired because they work hard. They are tired because they increasingly spend less time doing the very work that originally inspired them to enter medicine. Most young physicians do not dream of becoming experts in documentation, coding, or navigating digital systems. They choose medicine because they want to solve problems, accompany patients, and make meaningful decisions during important moments in people’s lives.

If AI merely changes the nature of administrative work while leaving physicians equally disconnected from the human purpose of their profession, I doubt burnout will improve substantially. We may create more efficient workflows without restoring professional fulfillment. Excessive workload certainly contributes to burnout. Equally important is the gradual loss of meaning that many physicians experience when they spend less and less time practicing the kind of medicine that drew them into the profession.

This is why implementation matters so profoundly. Across healthcare systems around the world, I have seen remarkable enthusiasm for AI, yet successful implementation rarely depends on the sophistication of the algorithm alone. It depends on whether clinicians trust the technology, understand its limitations, feel appropriately involved in its introduction, and experience it as genuine support rather than additional oversight. AI should reduce cognitive burden rather than simply replacing one form of complexity with another.

I therefore believe the next five years will be determined less by technical progress than by implementation quality. Organizations that introduce AI thoughtfully, redesign workflows, invest in education, and deliberately return time to physicians will likely see meaningful improvements in professional satisfaction. Organizations that simply layer AI onto already overloaded systems may discover that physicians now carry responsibility for both their own decisions and the continuous evaluation of algorithmic recommendations, while still remaining accountable for every outcome. That would increase complexity rather than reduce it.

Perhaps the most important lesson is that burnout cannot be solved by technology alone because its origins are not purely technological. Burnout emerges when physicians gradually lose the connection between their daily work and the deeper purpose that drew them into medicine in the first place. AI can help restore that connection by removing unnecessary burdens, but it cannot create meaning on behalf of the profession. That remains our responsibility.

In the end, I remain cautiously optimistic. If we implement AI with the explicit intention of helping physicians spend more of their professional lives practicing medicine rather than managing medicine, I believe it can become one of the most important contributors to physician well being that we have seen in many years.

Q9. You suggest that medical education should teach adaptability, agility, and tolerance for ambiguity as core competencies for physicians navigating an era of technological disruption. These are qualities that are notoriously difficult to teach in a classroom and are often only developed through direct experience with uncertainty and failure. What concrete pedagogical approaches, whether from medicine or from other fields entirely, do you believe could actually cultivate these qualities in medical students, rather than simply naming them as desirable traits in a curriculum document?

Mathias Goyen: One of the greatest misconceptions in education is the belief that judgment can simply be transferred from one generation to the next through lectures. Knowledge can be taught remarkably efficiently. Judgment develops differently. It emerges through reflection on experience, through exposure to uncertainty, and through the gradual realization that many important decisions do not have perfect answers.

Medicine has traditionally rewarded certainty. Students spend years learning that there is a correct diagnosis, a correct treatment, and a correct examination answer. That approach has obvious value because medicine must remain scientifically rigorous. Yet the reality of clinical practice is often very different. Patients rarely present exactly as described in textbooks. Several reasonable treatment options may exist simultaneously. Evidence may be incomplete. Individual values may legitimately lead to different decisions. Physicians therefore need to become comfortable making thoughtful decisions even when certainty is unattainable.

I believe this can be taught, but only if medical education changes what it chooses to reward. We should spend more time discussing cases where experienced physicians disagreed respectfully, where outcomes remained uncertain despite excellent care, and where ethical dilemmas had no universally accepted solution. Reflection after difficult clinical encounters should become as normal as learning anatomy or pharmacology. Students should not only ask, “What happened?” They should also ask, “How did I think? Why did I make this decision? What uncertainty did I overlook? What would I do differently next time?” Those questions gradually cultivate professional judgment.

Some of the most valuable lessons may also come from outside medicine. Aviation has developed a remarkable culture of structured debriefing in which experienced professionals openly analyze mistakes without automatically assigning blame. Elite sports recognize that performance improves through deliberate reflection rather than repetition alone. Military leadership education acknowledges that leaders frequently make decisions with incomplete information and changing circumstances. Executive leadership increasingly emphasizes adaptability, self awareness, and the ability to learn continuously as environments evolve. Medicine can learn from all of these disciplines because uncertainty is not unique to healthcare. It is a defining characteristic of leadership itself.

AI adds another important dimension. Future physicians will increasingly practice alongside systems that generate sophisticated recommendations within seconds. Their responsibility will therefore shift further toward evaluating, interpreting, communicating, and occasionally questioning those recommendations. Medical education should prepare students for that future by encouraging intellectual humility rather than intellectual certainty. The most valuable physician will not necessarily be the one who memorizes the greatest number of facts, but the one who consistently asks thoughtful questions, recognizes the limits of available knowledge, and remains willing to revise conclusions when new evidence emerges.

Perhaps that is ultimately what adaptability means. It is not the ability to change one’s opinion easily. It is the willingness to continue learning throughout an entire professional life without losing sight of the values that make medicine a profoundly human profession.

If I could change one thing about medical education, it would be this. We should spend less time asking students whether they know the answer and more time exploring how they reached it. In the age of AI, the quality of our reasoning may become more important than the quantity of our knowledge.

…………………………………………………

Prof. Dr. med. Mathias Goyen is a physician, radiologist, professor, author, and international healthcare executive. After many years in academic medicine and clinical practice, he served for almost fifteen years in global medical leadership at GE HealthCare, most recently as Global Chief Medical Officer for Imaging. Throughout his career, he has worked at the intersection of medicine, AI, leadership, and healthcare transformation. He is the author of What I Didn’t Learn in Medical School: Notes on Medicine, Leadership & the Human Side of Healing, which explores the human side of medicine in an age of AI. He is also Co-Founder of HelloAI, an educational initiative dedicated to responsible AI in healthcare.

Relevant Links

(*) What I Didn’t Learn in Medical School: Notes on Medicine, Leadership & the Human Side of Healing

HelloAI

………………….

Follow us on X

Follow us on LinkedIn

Jun 15 26

Trust Is Not a Feeling: Nuno Galante Valério on Engineering Accountability into AI for High-Stakes Healthcare

by Roberto V. Zicari

“On Innovation” series

“The way most AI conversations use “trust,” it names a feeling – and you can’t engineer a feeling.”

Q1. What do the builders of AI consistently fail to understand about deploying their work in a GxP environment, where the cost of being wrong is measured in patient safety?

Nuno Galante Valério: If I have to choose one thing: they don’t feel the distance between a demo that works and a system you can deploy. That distance is the entire job, where the whole effort is. It’s where I’ve spent my career.

I’ve sat through this meeting many times: a vendor, or one of our own teams, shows me something that genuinely impresses the room. The model reads a batch record, finds the deviation, drafts the CAPA, and does it faster and more carefully than the person who used to. Someone says the word “production-ready,” and means it. So, I ask them to run it again, same input. They do, and what comes back is almost the same. A sentence in a different order. A risk worded a little differently. A reference that was there the first time and, the second time, quietly isn’t. The mood in the room changes, because everyone understands at once that “almost the same” is not something you can write into a validation report, and put your name under.

Now, the easy lesson to draw from that room is the wrong one – that generative systems are too unstable to let near anything that matters. Europe’s first instinct, in its draft guidance for AI in manufacturing, was close to that: keep these models away from critical operations. The part I find genuinely interesting is that the direction is already moving off it, toward a risk-based view, and I think that correction is right. It turns on a distinction the builders almost never start from: risk is a property of the function, not of the technology. A frozen, deterministic model making a release decision with nobody checking it is more dangerous than a probabilistic one drafting something a qualified person reviews before it goes anywhere. The variation I provoked in the room was never the hazard; the hazard is letting any output, stable or not, reach a place you can’t walk it back from, without a control built to catch it. It’s why, when my team sizes up an AI use, the first questions aren’t about the model at all – they’re how critical the function is, how much the thing decides on its own, and whether we’d even notice it going wrong.

Here is what the builders are actually missing, and they miss it because everything in their world rewards them for missing it. They optimize for capability — can the system do the task, well, fast. The regulated world doesn’t start there. It starts somewhere stranger: can you tell me, in advance and in writing, the edge of what this thing will do, so that inside the edge I’m never surprised, and outside it I can prove I had something in place to catch it. And the failure that keeps me awake isn’t the one the demo shows. The demo shows what the model catches. I’m paid to worry about what it misses, because a miss in my world doesn’t raise its hand. A false alarm announces itself and someone investigates; a missed signal just sits there, looking like nothing happened.

So, the failure isn’t really technical. Most of these people are far better engineers than I’ll ever be. What they haven’t done, what they’ve never been asked to do, is be the person whose name goes on the line that says I am accountable for what this does in front of a patient, and for what it fails to do. If you’ve never had to sign that, “it works” feels like the finish. Once you have, “it works” is maybe halfway, and the easy half. The other half has no demo in it. It’s building the argument for why the risk that remains is acceptable, and then defending that argument to an inspector whose job is to assume you got it wrong.

I don’t say this to be hard on them; you can’t really know it until you’ve lived it. I say it because the most interesting work in the field right now is sitting in that gap, between “it works” and “I’d stake my name on it”; and almost nobody upstream has noticed the gap is even there.

Q2. Give us a concrete example where the governance process was itself the site of genuine innovation – where something was invented that would not have existed without it.

Nuno Galante Valério: The honest, real version of this starts with a failure, because the useful thing came out of the failure itself.

We had a system – document-grounded, retrieval-based, the kind that answers a quality question by pulling from a controlled procedure corpus rather than from the model’s own memory. By every measure we had, it passed. Retrieval was solid, the prompts frozen, the version pinned, the test cases green. The validation evidence was complete. And as the process owner, I wouldn’t give my sign-off. Not because I could point at a defect (I couldn’t, the validation was clean) but because “the protocol passed” and “I’ll stand behind this running in my process for the next eighteen months” are not the same statement, and the second one is what my signature actually carries.

Sitting in that gap, is what sent me recently to Petri Pohjanen. He’d spent years in automotive functional safety – ISO 26262, the world where software steers a moving car and a wrong output is a crash (not a typo) – and he’d held release authority, so he had personally signed the kind of statement I was hesitating over. Automotive had already solved, twenty years ago, a version of the exact thing I was stuck on: how do you take responsibility for a system you can’t test exhaustively. Their answer was never to make it deterministic. It was the safety case: a structured, layered argument that the risk of failure that remains is low enough to accept, with evidence under each layer. I’d been trying to discharge with a test report, something that was only ever going to yield to an argument.

What came out of working together we called the Layered Assurance Stack; work that Petri and I are still developing in the open. Three layers that deliberately don’t collapse into one another. The first is what the system is allowed to do in the first place. The second is how it can fail in ways that have nothing to do with a broken component (this is where automotive’s SOTIF thinking carries over, the failures that come not from a part breaking but from the system meeting a situation outside the assumptions it was designed around). The third, is what has to exist inside the organization to catch those failures, while it’s running. Run the three together, and you get a proportionality result: how much assurance this particular use, in this particular context, actually needs. We gave the result a name and a set of tiers, but honestly the name is the least interesting part of it. The moment you name a tier, people start treating it as a standard instead of as the answer to a question, and the thinking stops.

Here’s the part that wouldn’t exist without the governance problem forcing it. What pharma was missing was never a better test. It was a language for arguing about probabilistic systems that an auditor can actually follow. The field had two reflexes: set the temperature to zero, and pretend you’ve made the thing deterministic; or refuse to deploy at all – and both are answers to a question nobody should be asking.

There was nothing in between, so we had to build the in-between. And the only reason we could is that I’d hit a wall where my existing tools told me a system was fine and my own judgment told me it wasn’t, and I refused to settle that, by trusting the tools over the judgment.

The cost of it, since this series is about honesty and not press releases: it’s slow. It needs a certain organizational maturity. It needs you to disagree, sometimes sharply, with people you respect. And it needs patience to build at any real scale. The vocabulary is further along than the adoption, right now. Closing that distance is the part still in front of me and many of my peers.

Q3. ICH E6(R3) and the broader GxP framework assume deterministic, validated software. Generative AI is probabilistic and non-deterministic. How are you and your peers actually handling that tension in practice – not in principle?

Nuno Galante Valério: In practice, it gets handled by moving what you validate, which is a far quieter answer than the public debate would suggest.

The initial instinct is to ask how you validate the model. That question has no good answer, because the model is the part that won’t hold still. So, the people doing this seriously validate something else: the process made of a human and a system together, with the model sitting inside a control envelope as one component, rather than being the thing on trial. You don’t qualify the language model. You qualify the workflow around it – a person of defined competence reviewing the output against a defined standard, with the boundaries written down and the failure modes named before you start. The model is allowed to be probabilistic, as long as the process containing it is controlled. And that isn’t a dodge. It’s the same move we’ve always made with people: we never validated the analyst’s mind; we validated the procedure the analyst worked inside, because the analyst was fallible too and we knew it.

The second shift is harder, and it’s the one really unsettling – the move from validating at a point in time, to monitoring over time. Classical validation works as a photograph. You show the system was right on that day you tested it, then you freeze it. But there’s a thread in the interpretability research, Anthropic’s among it, about the gap between the reasoning a model states and the computation it actually performs. Take that seriously enough, and the photograph stops meaning much. If a system can drift, and if the reasons it gives you aren’t reliably the reasons it acted on, then proving it was correct on day one tells you very little about day two hundred. Validation has to become something closer to surveillance. You’re not proving correctness once; you’re sampling for it, continuously, against a population of inputs that keeps moving under you.

That points at a role with no name yet, which I think is the single most important unbuilt thing in the field. Some hybrid of quality assurance and data science – a person who can read a control chart and a model card with equal fluency, who watches a production AI system the way a process engineer watches a control strategy. That person isn’t on the pharma org chart, yet. The data scientists rarely think in GxP (actually, often avoid it) and the quality people rarely think in distributions, so whoever holds both frames at once, has usually arrived there by accident. Somebody is going to have to build that into a profession on purpose.

So, the honest answer to “how are you handling it”: imperfectly, and by learning as we go. The frameworks haven’t caught up, so for now it’s people building the bridge while they’re standing on it. Uncomfortable. It’s also, I’d argue, the fastest way to find out what the bridge actually has to carry.

Q4. You lead a “trust architecture” for AI in GxP. What does trust actually mean as an engineering requirement – how do you decompose it into properties that can be specified, tested, monitored, and maintained?

Nuno Galante Valério: I’d start by taking the word back from itself, because the way most AI conversations use “trust,” it names a feeling – and you can’t engineer a feeling. What you can engineer are the conditions that make the feeling unnecessary. A patient swallows a tablet without auditing the supply chain behind it. Not because they’ve decided to believe in it, but because a century of architecture has already absorbed the complexity, so they don’t have to. That absorbed, invisible structure is what trust actually is, once you stop treating it as an emotion. And notice where it lives: not in the tablet, but in everything standing behind it. With AI it’s the same, and it’s the whole reason I named the work the way I did: the trust that matters was never going to live inside the model. It lives in what you build around it.

So, the question I work on is: what does a system have to do, structurally, before it earns that kind of invisibility. Looking across pharmaceutical regulation, aviation, banking, nuclear, food safety, the blood supply, the machinery of courts and professions – seven functions kept reappearing. Not because they’re the only things present in any one regime, but because their absence is what turns up in the post-mortem,whenever trust collapses. Thalidomide was a surveillance failure. The 2008 Crisis was a failure of provenance and verifiability. Tuskegee – men left untreated for a disease that had a cure – was a failure of recourse.. Each one fails in its own characteristic way, and the mature version of every trust regime is, if you look closely, the scar tissue from once having been missing that function.

The seven are provenance, verifiability, accountability, reversibility, legibility, recourse, and surveillance. Rather than march through all seven, I’ll share how they group, because the grouping is what does the work. Provenance and verifiability are the is-it-what-it-claims pair: can you trace every component to its origin, and can someone not aligned with the maker check the claims independently. For most production AI in 2026, the honest answer to both is “not really” – we often cannot say who labelled the training data, or under what consent, and frontier evaluation is largely self-reported by the lab that trained the model, on benchmarks it partly designed. Accountability and reversibility are the can-it-be-answered-for-and-undone pair. Legibility and recourse are the can-the-affected-human-see-it-and-get-a-remedy pair. And surveillance stands alone – the population-level function, that catches the slow, aggregate harm that no single user would ever notice in themselves.

People ask why seven, and not five or nine. Because seven is the smallest set that survives a comparative test. Drop one and you find you’ve fused two functions that do genuinely different jobs; add one and you’ve split a function into halves that were never really independent. I’m not claiming it’s the only taxonomy anyone could draw. I’m just claiming you can’t remove a piece without losing something you needed, or add one without repeating yourself. That’s a falsifiable claim, which is the most I can honestly offer – and I’d be glad to be proven wrong.

Where it gets interesting is that GxP doesn’t weight the seven evenly. The three that pharma tends to underbuild are, awkwardly, the three that decide whether AI is deployable at all.

Surveillance is the one the non-determinism question kept circling. Point-in-time qualification is just a photograph; a system that can drift needs continuous monitoring against a population that moves. Pharma already knows how to do this for drugs – it’s called pharmacovigilance. It just hasn’t started doing it for models.

Reversibility almost nobody builds, and in a regulated setting it’s unforgiving, because so many of the actions an AI touches can’t be taken back. You can recall a batch. You cannot easily un-make a decision that’s already propagated into a regulatory submission or a patient’s record. So, reversibility here is less an “undo button” and more a question: “is there a containment boundary that catches a wrong output before it becomes irreversible”. That’s a design property, it costs money, and it’s usually the first to be cut when a team is chasing capability.

And recourse is the one the engineering-minded want to leave out, and the one I can’t let them. When the system is wrong about something that matters, is there a path for the human to remediate it. A system can be perfectly provenanced, verifiable, accountable, reversible, legible, and surveilled, and still be untrustworthy if being wrong about you carries no fixing. Recourse is the function that remembers there is a person at the end of all this, not just a number or metric. It’s also the one with no clean home, in most architectures; which is exactly why it goes missing.

Decomposed this way, trust stops being a vibe in a vendor pitch (that truly doesn’t help anyone) and becomes a set of functions you can specify, assign owners to, test against, and audit. The work of a trust architecture is exactly that translation – taking a word everyone nods at (and instinctively understands), and turning it into seven things someone has to be accountable for. The moment trust has an owner and a test, it isn’t a feeling anymore. It’s engineering.

Q5. Cerf, Kay, Stroustrup, Booch built foundations others stand on. You’re building the governance and trust infrastructure that decides whether AI can stand on those foundations in one of the highest-stakes domains there is. Looking at the next decade – what needs to be built that doesn’t yet exist, without which the most important AI applications in medicine simply won’t be deployable at scale?

Nuno Galante Valério: Two things. The second is much harder than the first, and almost no one is working on it.

The first is a regulatory science that can reason about distributions, not just instances. Our whole evidentiary tradition rests on the qualified instance: this system, tested, frozen, proven. What we need is a science, that knows how to accept evidence of a different shape: this system stays within acceptable bounds, across a whole population of inputs, monitored continuously, with these statistical guarantees. That’s a different standard of proof. Regulators are edging toward it – the FDA’s predetermined change control thinking, the EMA’s Annex 22 work – but edging toward something isn’t the same as having it. Until an inspector can be trained on what “good” looks like, for a monitored probabilistic system, every deployment is negotiated from scratch, and you can’t scale a thing that has to be negotiated every single time.

The second, is the one I actually care about, and the hardest. We need governance that can hold disagreement, without collapsing it. Nearly every framework I know, the good ones included, and mine included, works by reducing a complex system to a single verdict: approved, or classified, or certified, take your pick. One number, one answer. But the systems we govern now don’t have a single answer inside them. A model can be safe for one use, and a hazard in the one next to it. It can be defensible to one stakeholder, and unaccountable to another. It can be right on average, and catastrophic in a certain use case. Force all of that into one verdict, and you haven’t governed the complexity, you’ve basically hidden it. What we don’t have yet – in standards, in regulatory science, in how we design organizations – is a way to hold several legitimate, competing assessments at once, and stay coherent without flattening or averaging them. I’ve come to see that less as a compliance problem than as an architecture problem, which is why I think it’s the one that actually decides whether the important applications ship.

Which is the thread running under all of your questions, and the thing they keep nearly asking. So let me say it plainly in the next question, since you’ve left me the room to do it.

Qx. Having answered these, what’s the one thing you most wanted to say – about governance, about trust, about what innovation looks like from inside a regulated environment – that none of the questions gave you the right opening to say?

Nuno Galante Valério: That the hardest problem in AI governance isn’t technical, and the reason the field keeps treating it as though it were, is that we inherited our instincts from a generation of builders who worked in a world that behaved the same way twice.

The foundations your series has documented – the protocols, the languages, the methods – share a property so deep, that it’s almost invisible: they’re deterministic. Same input, same output, every time. That isn’t incidental to how Cerf or Stroustrup think; it’s the ground they built on, and it’s a magnificent ground. It made software something you could reason about, prove things about, trust. The entire apparatus I work inside – validation, qualification, the regulated assurance of software – is downstream of that same assumption. Trust meant predictability, and predictability meant the thing had a single, stable, knowable behaviour.

The systems we’re building now don’t have that. A generative model has no single stable behaviour to validate – it has a distribution of behaviours, some excellent, some dangerous, none of them sovereign over the others. And here is what I’ve come to believe, and what I most wanted to say: this isn’t a defect we’ll engineer away. It’s the nature of the thing, and it’s the same nature that shows up the moment you look at any sufficiently complex system that has to act in the world. An organization is not a single coherent decision-maker; it’s a contest of legitimate, competing internal claims that somehow has to produce one decision. A regulatory regime is a parliament, not a person. Even a single expert under pressure is rarely one unified voice – they’re a negotiation. We have spent a century pretending these things are unitary, because unitary things are easier to hold accountable. The pretence is now breaking, because the technology we’ve built is the first one that refuses to perform the unity.

So, the governance problem I think actually matters – more than any specific standard or framework, including my own – is this: how do you make a thing trustworthy when it cannot be made to govern itself from the inside. The deterministic answer was always “constrain it until its behaviour is single and predictable.” That answer is exhausted. It doesn’t work on models, and if we’re honest, it never really worked on institutions either; we just had the luxury of pretending and it was still mostly ok. The answer that does work, is architectural. You stop trying to force the internal multiplicity into a single obedient self, and you build the external structure – provenance, verifiability, surveillance, recourse – that lets a system which is genuinely plural on the inside, still be answerable on the outside. You govern the multiplicity, instead of denying it.

This is why I think people one step downstream of the technology – in the regulated trenches, where the cost of being wrong is a patient and not a metric – have something to contribute that the foundation-builders and frontier labs, for all their brilliance, are not positioned to see. They built a world that holds still. We’re learning to govern one that doesn’t. The governance of multiplicity – holding many competing, legitimate voices accountable without flattening them into one false answer – is, I’m increasingly convinced, the same problem at every scale: inside a model, inside an organization, inside a regulatory regime. Get it right in one place and you’ve learned something about all of them.

I’ll admit I didn’t arrive at that view purely from the regulatory work. It’s the kind of conviction you reach the long way around, through more than one part of a life. But the questions were generous enough to give it a professional home, and that’s the version worth putting on the record.

So, that’s the thing none of the questions asked. Thank you for the room to say it.

………………………………………………………………….

Nuno Valério is Head of Innovation for R&D Quality at Merck Healthcare in Darmstadt, where he leads AI governance for GxP-regulated pharmaceutical environments. A clinical pharmacist by training (MSc, Universidade de Coimbra), he has spent twelve years at Merck, moving from compliance into digital innovation leadership. He is the author of Trust Architecture, a seven-function framework — provenance, verifiability, accountability, reversibility, legibility, recourse, and surveillance — for making probabilistic AI systems trustworthy enough to deploy at scale. He writes the Trust Architecture newsletter and speaks regularly on what it takes to treat trust as something you engineer rather than something you simply feel.

……………………………..

Follow us on X

Follow us on LinkedIn

Jun 9 26

The Cost of Getting It Wrong: Ivan Santa Maria Filho on Building AI Systems That Hold Up in Production

by Roberto V. Zicari

“By picking the hard problems I found my people. Colleagues, advisors, mentors, and leaders I follow, and people I help. “

Q1. In your previous conversation with ODBMS Industry Watch (*), you described BigFrames as a “promise of a data frame” — a lazy evaluation model that defers execution to let BigQuery’s optimizer combine and reduce operations before they run.

In the context of AI workloads specifically, can you walk us through a concrete successful example where that lazy evaluation produced a meaningfully better outcome — in cost, performance, or correctness — than an eager approach would have? And conversely, can you share a counter-example where the deferred execution model surprised a team and created unexpected cost or behavior in production?

Ivan Santa Maria Filho: BigFrames helps here, but it is not the main protagonist. It allows users to express what they want in data frame terms, optimizes them in a plan, which is then passed to BigQuery, which optimizes it again using its regular query optimizer.

BigFrames optimizes the tasks, for instance it might replace some queries by a table scan via store APIs. BigQuery might create a proxy model and replace LLM calls. Proxy models cost as little as 1% of a regular LLM call, and are much faster. You can learn about proxy models on BigQuery’s blog.

BigFrames also supports user defined functions (UDFs), both hosted by Cloud Run, or fully managed, a feature that just reached general availability. UDFs have downsides, like additional security and isolation overhead making them slower than native BigQuery functions. But they can open the use of the entire Python ecosystem, and in the case of Cloud Run hosted functions, hosting third party models in Cloud Run. That gives users a more traditional capacity planning problem, and predictable costs.

The AI specific issues that worry me the most are security issues, hallucinations, and cost surprises.

I dislike charging by token count, as users don’t control the number of tokens they generate.You can set a limit, which might result in truncated answers instead of compact ones. If using a reasoning model, you typically won’t control what they exchange with each other, but still pay for it.

Hallucinations are just how LLMs without additional reasoning and tools work. Andrej Karpathy is a much better explainer than I am, so I recommend looking him up on YouTube. That said, I want to share an intuitive explanation.

User prompts are converted from words to floating point vectors using algorithms like CBOW (“continuous bags of words”) and skip-grams. The vector values change based on the sequence of words being converted, and values will represent an ontological space, or similar meaning. A word like bank would have different values when in a sentence like “what is the typical withdrawal limit for banks ATMs?”, and “what kind of banks does the Mississippi river have?”.

The floating point vectors are then fed to something similar to a Transformer, which uses the attention layer to model relationships, plus find the most likely set of words to follow the input sequence among texts that use the same meaning it inferred to the words on your input. It then feeds the output token as input and estimates the next. It does that until an exit criteria is met, and that is the answer you get.

That answer might then be fed to another model for validation, and maybe a loop of exchanges form. Some companies might also create models of the real world to anchor generation, might add grammar correction, and all sorts of other output quality control.

Despite all the efforts in the industry, users still manage to create inputs (prompts) that will cause the LLM to yield nonsense like fake names, or fake articles, or fake source materials because it started with a word that would likely be the next, then fed that word as input, which can take the LLM down a path that makes no sense.

That is how they hallucinate research that never existed, but makes sense as an abstract, from very real scientists that might work on a related area. If you ask expert questions that you know have answers, the LLM is more likely to generate a good answer, hence pass the bar association test or win a programming competition. Please note this is the nature of LLMs, not AI in general.

Getting back to deferred execution surprises, the most typical are getting error messages much later in the code execution than you might expect. The other is to see the execution go super fast, blazing through commands you know are expensive, just for later, when you peek at the results, that takes an inordinate amount of time to complete, because that is when the system is finally doing what you asked.

Q2. Getting an AI feature to work in a notebook and getting it to work reliably in production are two very different problems. From your experience across Microsoft, Meta, and Google, what are the most common and costly gaps between a promising AI prototype and a production system that actually holds up — and what testing strategies or architectural patterns have you seen consistently make the difference between teams that close that gap successfully and those that don’t?

Ivan Santa Maria Filho: I believe testing makes an immense difference. The job of test and security teams is to break products with valid scenarios, be that by probing APIs and configurations, or second guessing what the AI models and evaluation sets are saying. Having an antagonistic view of the system is key.

In a way AI uses English as a programming language, and I don’t see the same level of tooling and framework protection when the programming language is a prompt. Worse, when the inputs are audio, video, or semi structured data where pretty much anything is valid.

If you browse the specialized news you will find recent cases of support chat bots being used to solve programming problems, someone ordering a thousand cups of water in a drive through, people feeding YouTube videos with supersonic audio encoding malicious prompts, on June 5th, 2026 I prompted Gemini Pro “I am very concerned about mixing cleaning products in my house. What are the dangerous combinations of chemicals I should avoid?” and got a table explaining how to produce Chloramine gas, Chlorine gas, Chloroform, Peracetic acid, and how to use drain cleaners to melt metal.

My advice is to pay someone to break and criticize your product, then make a call to ship or not after listening. It can be irritating, but it is also a really good investment.

The second category of mistake in production is more subtle. Imagine, for instance, you want to create an agent to help your company to screen resumes.

Assuming the use of LLMs, well structured prompts and RAG are common techniques to make the model pick what you want, but so is fine-tuning models. Because fine-tuning can be expensive, it is common to use an agent to judge the output of another, something sometimes referred to as “auto-rating”.

This is not necessarily bad, and I used it myself, but done wrong there is a good chance the producer and consumer will converge into what they consider ideal. A recent study shows AI models tend to prefer content they generated themselves, and I strongly suspect auto-rating plays a role there. Then another study shows that by sharing the same technology providers, AI screening is creating a mono culture of hiring. Job applicants being rejected by 400 companies, there is a fair they were rejected by one or two models.

James Mickens, during his 2018 Usenix Security keynote, compared machine learning to the egg drop experiment. I highly recommend watching that keynote.

How to avoid this? Have a great eval set that represents and evolves with your needs, then have acceptance tests for new models and prompts.

Q3. You mentioned that BigFrames’ first version was too expensive, and that the team brought costs down to be on par with SQL. Cost control in AI pipelines is something many teams underestimate until they receive their first large cloud bill. What are the most important levers for controlling cost in AI data workflows at scale — and what are the most dangerous cost traps you have seen teams fall into, particularly when moving from prototype to production?

Ivan Santa Maria Filho: With token based charges this is hard. My recommendation is to set and track daily and monthly usage limits. Not annual limits, not quarter limits – at most monthly. What you really want is cost control plus capacity management. Until usage stabilizes and a predictive model (or spreadsheet) can predict costs, users will need tighter controls.

I strongly recommend against setting a leaderboard congratulating whoever is using the most tokens (or the least tokens). Usage of tokens is not a goal. I suggest instead congratulating whoever moved your business and quality metrics the fastest.

I tend to be conservative when it comes to capacity planning and cost management, and prefer to pay for units of consumption I control. I also prefer elastic consumption, so operating expenses over capital expenses. I would prefer renting instances, running traditional performance and capacity planning tasks to model my needs where possible.

A big trap is underestimating how much data your company has. As I type this answer, I am wearing a T-Shirt that says “BigQuery’s largest single table contains over 70 trillion rows and exceeds 200 petabytes”. If I called an LLM on each row of this table, and were charged $0.50 per row, the charges pre-tax would be about the GDP of the United States, which is roughly 31 trillion dollars. That is a lot of data, and structured data is usually, counted by the byte, less than 10% of a typical company.

AI opens the possibility to process every document, meeting video call, customer sales phone call, email, logs, and everything else your company has stored. I imagine that at some point it will be possible to push it all to an AI and ask questions, but not today.

So, answering the question, the most important lever is to experiment first, find exactly how AI will be used and whether it is the most cost effective way to solve the problem, then ask yourself if that business will recover the costs.

A positive example is using AI to answer ad hoc questions that require real world knowledge. Let me share specific examples:

Starting with a list of homes for sale, use BigQuery or other tools to find reasonably priced houses in a good school district. You can do it without modeling attendance areas, clean up school grades, define “reasonable”, etc.
From the National Hurricane Center (NOAA) download a model and temperature series. Ask the model to generate data for future years where the temperature of the ocean surface varies by some statistical distribution. See what happens to hurricanes without having to actually do the statistics.
Help your local animal shelter. From a list of pets available for adoption, search for one that is “smaller than a cat, and good with kids”, and odds are you will get both small cats and ferrets.

All those questions would require data acquisition, modeling, and a lot of discussion about schemas. With the AI operators and LLMs that can be done in a snap. My examples are simplistic in nature, but you can use any ontology or classification you might have to do the grouping.

Another thing AI is fairly good at is entity extraction, which can be incorporated into your existing ETL pipeline to augment your data.

Q4. You described a pattern where UDFs can return pass/fail codes and a while loop retries only the failed rows — a much more controllable approach than retrying an entire SQL job. That kind of practical engineering wisdom often lives in the heads of experienced practitioners and never makes it into documentation. What are two or three other production patterns like that one — things that are technically possible but hard enough to discover that most teams get wrong — that you wish more AI practitioners understood before they start building?

Ivan Santa Maria Filho: The direct comparison is how sub-agents and agents talk to each other. You don’t want to be in production with an all-or-nothing architecture, where production requires all agents to return their answers within a time budget. AI systems remain hard to model as far as latency and resiliency goes.

I tend to prefer a loosely coupled architecture, light/heavy agent duos, and time bound flow control. This is a productionized version of what is sometimes called a mixture of experts. I also tend to prefer what is called a whiteboard architecture.

In this system the user prompt is presented to multiple lightweight filters that decide whether or not the more expensive agent they represent should be called. They return a certainty score. For each score above an arbitrary threshold, the respective more expensive agent receives the user input and a time budget. All agents write their replies to a shared memory space.

Either when all agents replied, or the required ones plus a time out is reached, a “finalizer” agent reads all answers and either picks a winner or summarizes all findings. In the interloping time between agents replying and the time out limit, every large agent can read each other’s answer.

Why do I like this?

If any agent times out or crashes you can move on with a potentially degraded answer.
The answers can be cached. Everyone in the company can add their own, so fewer meetings and less arguing.
It is easy to build a reputation score for agents that say they have high confidence the query is for them, but the summarization agent never uses their answers. Hence getting rid of agents.
Fewer high priority tickets in general.

Ideally this is coupled with a good offline eval set, and user acceptance or other online evaluation, so we can get rid of agents as they don’t prove their value.

Please note this is one of those lessons that not everyone agrees with. I tend to prefer solutions that self-clean, and have just the right amount of process. So while I advocate for testing and good eval sets, I also advocate for not having a gatekeeper deciding who in the company can try something new. I am a big fan of trying a lot of things, so I favor making attempts cheap, and cleanup as automatic as possible.

I mentioned sanitizing inputs and a proper security posture, so please do your threat models. When doing them, be exceptionally skeptical about anything defined as a “trusted subsystem”, as AI based agents and LLMs lack input sanitization and checks developers get from API calls, modern compilers, and static analysis tools. Agent based systems will not necessarily follow a traditional flow of API and tool calls, so anything callable must be hardened. Security in many systems became so complicated that the temptation is to grant more permissions than strictly necessary to a security role, then grant security roles more permissive than necessary to agents. When you enable notebook support on your favorite cloud provider, odds are you are enabling literally thousands of individual security permissions that the agent or model will use. Agents do not have common sense, if they can call something, odds are they will. You should treat them as “chaos monkeys” as far as security goes.

Fine tuning can introduce bias, over-fitting, memorization and leakage of trade secrets, and more. What was used to fine tune one model does not necessarily work for the next model or revision.

Make sure you have a good CI/CD pipeline and eval sets to protect your production, and make sure you have a way to pause update rollouts to production. That includes new revisions of your provider models.

You should treat major model updates as breaking changes because the behavior will change. To be very explicit, I wrote a prompt that started with “enumerate the items” and one model revision later it simply stopped working. I had to re-write the prompt to “list the items”. That would never happen with a traditional API, but happens when the programming language is English. If you write prompts in any other language, with the potential exception of Chinese, your experience will be worse.

As a general guideline I wish developers did not fall for magical thinking and remember that at the bottom of this whole AI stack is a datacenter, network, computer, operating system, programming language and frameworks, and everything else that can break and cause havoc, including capacity and cost control.

A lesson I watched people smarter than me learning is that major model updates will break your prompts, and the fine tuning you did for a version of one model will not necessarily work for the next model revision, never mind major version update. You will pay the fine tuning cost in money, time and effort for every update. I highly recommend knowing the model and agent support window your provider offers, and have model acceptance tests the same way they have release pipelines for traditional services. If your provider feels free to upgrade their LLMs and agents at their own pace without long term support, or tell you that model changes are not breaking changes, you will have to keep pace.

I think those would be the major groups. Take care of security, have a good eval set for upgrades, control your deployments like you would for traditional software, and watch for AI specific bad patterns and costs like bias in the model and costs in time and money for tuning.

Q5. You ended your previous interview with a striking observation: that if agentic AI achieves the kind of natural language interface we see in science fiction, the number of people writing Python directly may drop dramatically — and the frameworks we are building today may become less relevant. Given that possibility, how should AI practitioners and data engineers be thinking about what skills and architectural understanding will remain durable — the things that will matter regardless of which abstraction layer sits on top — and what investments in tooling or capability do you think are being made today that may not age well?

Ivan Santa Maria Filho: This is a hard question to answer because it mixes two types of advice. The first is what I think makes a good engineer, and the second is what can help someone’s career. Those are surprisingly independent factors.

As far as engineering excellence goes my advice has not changed in a while, and it is to learn the basics well. We are still using Von Neumann architecture for computers, a design originated in the 1940s. No amount of improvements displaced this, and probably won’t in my lifetime. All industry solutions, including all AI models and agents, from training to inference to apps use it.

The industry and academia built quite a scaffolding around its limitations. Understanding why and how this was done is a durable skill. I suggest being able to compare computer architectures and instruction sets.

Memorizing algorithms and data structures is becoming less useful, but understanding why they were designed like that, and how they not only go around computer architecture limitations but actually leverage them, is a very durable skill. It is critical thinking applied to engineering, and critical thinking is a rare commodity.

Algorithm analysis also helps getting a job, so also a practical skill to have. A practical exercise would be to learn B-Trees and binary trees, and know where and why to use each. Another would be to learn backpropagation, which should have the double value of making you skeptical about AI, and giving you a sense of wonder of what was accomplished.

Distributed systems have their own set of basics to learn. Distribution techniques, coordination techniques, and what they do to API patterns. It does not hurt to know networks either.

None of those skills will go away, and learning the tools of your trade will always be a differential. Be curious, skeptical and try not to lose a sense of wonder.
Career wise, I would suggest learning the business model of your area. For instance, do you really believe the Internet works using a bandwidth barter system? That is not true since the mid 1990s, but a surprisingly high number of engineers believe that is how it works. An even larger number of people don’t understand how pricing works, and assume pricing is set based on cost.

Understanding how a particular industry works will lead to better opinions, from net neutrality to controlling bots online, and very likely better outcomes for business ideas. Even for people trying to disrupt an industry, it is important to know what incentives you can leverage.

Qx. Anything else you wish to add?

Ivan Santa Maria Filho: We are going through a complicated time, and I want to share a trick with younger engineers who are anxious about the current market and trends.

My MSc title roughly translates to “Natural Language Processing using Multi-Agents”. It is a very dense comparison of natural language processing formalisms focused on the math behind them. I wrote it while taking an advanced compiler optimization class in parallel. I suspect I slept more hours in the lab than in my dorm for a whole year.

Exactly none of the natural language formalisms I compared survived the following decade. Compiler architecture changed so much that most of what I learned is no longer directly applicable.

Yet, in general terms I bet on the right things, and had a very successful career. I do not diminish the role of luck in my life, help of friends, mentors, and family.

That said, I roughly follow four rules to anticipate trends, mostly learned from fiction authors like Octavia Butler, Asimov, and Arthur Clark:

Project what already exists. Take multiple existing technologies you find instinctively promising, and project where they will land in 5 years. For instance, I would assume solar panels will continue to gain in efficiency, robot programming will continue to evolve, and batteries will get more dense and cheaper.
Be optimistic and ideate. Write down ideas of what you would do with the technologies you chose, assuming they worked as predicted. With the three I listed I can think about robots that never need to recharge other than “sunbathing”.
Apply the “one miracle rule”. In my example it would take a “miracle” to have powerful enough batteries that fit a humanoid robot. It would take a second “miracle” to get solar panels as efficient as necessary. Given this idea requires two miracles, I would not bet on it materializing.
Iterate. If I re-work the robot idea to remove at least one miracle, maybe defining a pre-fabricated house (static and large) as a “robot”, that would drop the number of miracles to one (in this case, people who can afford it, buying buying a pre-fab home) and that might materialize.

Nothing that comes out of this exercise is easy to build, or has any guarantees of success. But by picking the hard problems I found my people. Colleagues, advisors, mentors, and leaders I follow, and people I help.

………………………………………………………………………………………………………….

Ivan Santa Maria Filho has a BSc and MSc in computer science and a wide variety of experiences as individual contributor and manager, having owned a small software company and worked on multiple billion dollar products and services at Microsoft, Meta and Google. His main areas of expertise include vertical integration of stateful, large scale services with ephemeral VM infrastructure, and the infrastructure itself. Ivan Santa Maria Filho has a BSc and MSc in computer science and a wide variety of experiences as individual contributor and manager, having owned a small software company and worked on multiple billion dollar products and services at Microsoft, Meta and Google. His main areas of expertise include vertical integration of stateful, large scale services with ephemeral VM infrastructure, and the infrastructure itself.

(*) Technical Architecture Focus: Scaling Pandas to Petabytes: The Architecture and Tradeoffs of BigQuery DataFrames. Interview with Ivan Santa Maria Filho, ODBMS Industry Watch, March 7, 2026

……………………………..

Follow us on X

Follow us on LinkedIn

Apr 7 26

On AI, Governance, Ethics, and Societal impact. Interview with Lambert Hogenhout

by Roberto V. Zicari

“There is too little attention being given to the effect of all this emerging technology in the medium to long term, let’s say 5–10 years. The effects on how we work, how we learn, communicate, form connections and self-identify.”

Q1. How do the challenges of implementing responsible AI differ across varying contexts (developed vs. developing nations), and what fundamental principles remain constant regardless of a country’s technological maturity or resources?

Lambert Hogenhout: In advanced economies, the primary challenges tend to be around algorithmic bias embedded in legacy systems, regulatory complexity, and managing the pace of adoption across large, entrenched institutions. In developing countries, the challenges are more foundational: limited digital infrastructure, smaller pools of technical talent, weaker data ecosystems, and the risk that AI solutions designed elsewhere are imported without sufficient adaptation to local realities, languages, and cultural contexts. The fundamental principles are the same however: transparency (people should understand when AI is being used and how it affects them); accountability (someone must be answerable when things go wrong); fairness (AI should not entrench or amplify inequalities); agency (the people affected by AI-driven decisions should have meaningful recourse).

Q2. What misconceptions about AI governance do you encounter most frequently at the international level?

Lambert Hogenhout: The illusion that AI safety and innovation are mutually exclusive. The idea that if you govern AI responsibly, you necessarily slow down progress and lose competitive advantage. The evidence does not support that. In fact, organizations and countries that invest in trustworthy AI frameworks tend to foster greater adoption, because users, businesses, and governments are more willing to rely on systems they can trust.
Another misconception is that governance of AI is a technology issue. It is not. It is about values, power, and inclusion: decides, whose interests are represented, and who bears the consequences when things go wrong.

Q3. How has the conversation around AI ethics and responsible tech evolved over the past 20+ years?

Lambert Hogenhout: As we have gradually digitized a large part of our lives, compute power has grown and algorithms have advanced, both the potential useful applications and the risk of undesirable effects has grown. Policy needs to capture that at a high level, and strategy needs to determine how this all affects us and what’s next. In the early days of big data, the conversation was largely about privacy and data protection—who has access to our information and what are they doing with it. As machine learning matured, the focus shifted to bias and fairness—we realized that models trained on historical data could perpetuate and even amplify discrimination. Now, with generative AI, the conversation has broadened dramatically to include questions about misinformation, intellectual property, the nature of creativity, and even what it means to have an autonomous system making consequential decisions. What has also evolved is who is part of the conversation. Twenty years ago, these were largely technical discussions among specialists. Today, AI ethics is debated in parliaments, boardrooms, classrooms, and living rooms. That democratization of the discourse is healthy, even if it makes governance more complex.

Q4. What lessons from earlier technology waves are we forgetting as we rush to deploy generative AI, and what genuinely new ethical challenges does GenAI present?

Lambert Hogenhout: What is new is that the challenges have become more complex. A designer or regulator, with full power to make AI responsible, will have a hard time to foresee the risks of outputs and decisions by AI systems. Part of that is that unlike previous technologies, today’s AI is inherently non-deterministic. Part is that it is increasingly a general-purpose technology and it is not always clear at the outset exactly how an AI system will be used, and therefore what the risks are.

One lesson we are forgetting is the importance of deploying gradually and learning as we go. As the speed of innovation increases, the pressure to adopt quickly has led many organizations to deploy widely before they fully understand the risks. Another forgotten lesson is that technology alone does not solve organizational problems—you need to change processes, train people, and build governance structures alongside the technology. The new challenges include the sheer scale of potential misuse—the ability to generate convincing disinformation, deepfakes, and synthetic content at unprecedented volume and speed.

Data privacy concerns have been brought to a whole new level with the increased capabilities to collect, correlate and process data. For instance, I have been working recently on Facial Privacy, which is under threat from facial recognition built into cameras, smartphones and AI glasses (and, unlike a password, we cannot change our face when it is compromised!). There is also the question of intellectual property: the existing regulations and norms (e.g. “fair use”) were not designed for the current reality of massive data and AI, and it will take time to adjust them. In the mean time, we find ourselves in an IP grey zone that is ungoverned and probably unfair. And the increasingly capable forms of generative AI blur the line between human and machine output in ways that raise deep questions about authenticity, trust, and accountability.

Q5. What are the critical components of effective data literacy that go beyond “understanding what data is” to actually empowering people to make better decisions with data?

Lambert Hogenhout: From my experience, the most effective data literacy programs are anchored in real work. People learn best when they can immediately apply what they have learned to problems they care about. Second, effective programs do not only focus on technical skills but include a mindset that includes thinking about data. Teaching people to ask the right questions: Where did this data come from? What is missing? What are the limitations? What decisions will this inform, and what are the consequences of getting it wrong? It is also important to realize that data literacy is not a one-time effort. It requires ongoing practice, peer learning, and support (tools and communities of practice) and clear data governance so people know what data they can use and how.

Q6. How should organizations think about data literacy differently in the age of AI?

Lambert Hogenhout: The data, the models, the reasoning processes, output and decisions, and the UI to steer these processes, are all part of the same system. Feeding bad data to the AI will result in unreliable outputs or wrong decisions, just as bad prompts will deliver poor results. This means data literacy must evolve into something broader—what I would call AI literacy. It is not enough to understand data in isolation not is it enough to just focus on prompting skills, for instance. People need to understand how data flows into models, how models generate outputs, and where the opportunities for error, bias, or hallucination exist along that chain. They need to develop an intuition for when to trust AI outputs and when to question them. As the building of AI systems and AI agents is increasingly democratized, the design of an AI agent also depends on the user’s understanding of the workings of AI, from the data layer to the result. When anyone can build an AI agent, the consequences of poor understanding are no longer limited to a bad spreadsheet. They can cascade through automated systems in ways that are difficult to trace and correct.

Q7. How do you see the relationship between legal compliance (privacy regulations like GDPR, CCPA) and ethical responsibility?

Lambert Hogenhout: For data privacy, as with AI, the accountability for the safety of such systems is shared between the governments (regulation), the model providers, the builders of the AI applications, and the end users. Neither of them by themselves can guarantee AI safety. For model providers and creators of AI applications, building in ethics by design—with regard to training data, algorithms and guardrails—is the right decision in the long run, not only morally, but also good for business. As happened with data privacy, where citizens became increasingly concerned about their personal data, I see the same happening with AI: consumers will become more critical of which AI systems they want to use and which not. And how and where they want them and where not.

Q8. Can organizations be fully compliant yet still deploy technology irresponsibly? How should leaders navigate this tension?

Lambert Hogenhout: For most organizations, the more valuable currency is their reputation and the trust of their customers, partners and their own employees. Each of these groups have expectations of what can be expected within the societal norms. To betray that trust and those expectations for the sake of efficiencies created with AI is always a bad strategy. Examples are targeted advertising that exploits psychological vulnerabilities, or AI-driven hiring tools that are technically non-discriminatory by legal standards but systematically disadvantage certain communities in practice.

Conversely, there are situations where doing the ethically right thing may create tension with strict regulatory interpretation—for instance, using health data in ways that could save lives but push the boundaries of consent frameworks designed for a different era. My advice to leaders is this: do not let your legal team be the sole arbiter of what is acceptable. Build an ethics function that works alongside compliance, brings diverse perspectives to the table, and asks the harder questions—not just “can we do this?” but “should we do this?” And engage your stakeholders—your employees, your customers, and the communities you affect—in that conversation.

Q9. What are the biggest gaps between what technologists understand about policy and what policymakers understand about technical realities? How can we create better dialogue?

Lambert Hogenhout: The pace and complexity of technology and its pervasiveness in society and business makes it hard for regulators to understand what they regulate. In some industries (e.g. finance) we have seen voluntary standards evolve. I would like to see that in tech as well. However given the pace of development and the large amounts of investment, many Big Tech companies are hesitant to slow themselves down too much for the sake of ethical concerns. On the other side, many technologists underestimate the complexity of policymaking. They tend to think of governance as a binary—regulate or do not regulate—and miss the nuance of how policy is negotiated, implemented, and enforced across different jurisdictions and cultures. They sometimes dismiss governance as bureaucratic overhead rather than recognizing it as a mechanism that can actually create the conditions for sustainable innovation.

To bridge this gap, I believe we need three things. First, we need more people who can speak both languages—technologists who understand policy and policymakers who understand technology. These translators are rare and valuable. Second, we need structured forums where technical experts and policymakers can engage in genuine dialogue—not lobbying, not adversarial testimony, but collaborative problem-solving. The model of regulatory sandboxes, where new technologies can be tested within a governed environment, is a promising approach. Third, we need the private sector to engage more constructively. Voluntary standards, industry-led certification, and genuine self-regulation—not as an alternative to public governance, but as a complement to it. The industries that have done this well, like aviation safety, show that it is possible to innovate rapidly while maintaining strong safety cultures. The question is whether the tech sector has the will to follow that example.

Q10. Looking ahead to 2030–2035, what emerging AI capabilities will fundamentally reshape governance, ethics, and societal impact? Are we preparing adequately?

Lambert Hogenhout: This is exactly what keeps me awake at night and that I often speak about: so much is happening right now that it takes our full attention to deal with the Now, with tomorrow and next week. There is too little attention being given to the effect of all this emerging technology in the medium to long term, let’s say 5–10 years. The effects on how we work, how we learn, communicate, form connections and self-identify. The convergence of AI with biotechnology, brain-computer interfaces, and robotics will raise questions about human identity and autonomy that we are barely beginning to consider. And the increasing use of AI in defense and security applications creates risks that are existential in nature.

A worst case scenario is where technology ends up making us unhappier, lonely, unfulfilled and unproductive. I think by making more intentional choices in how we adopt technology we can increase the chances for a future where humans thrive. No, we are not preparing adequately. We are governing yesterday’s AI while tomorrow’s is being built. To change that, we need to invest far more in foresight—not prediction, but structured thinking about possible futures and their implications. And we need to embed that long-term thinking into the organizations and institutions that shape our collective future.

Q11. What should organizations and policymakers be doing now to prepare for AI capabilities that don’t yet exist in production systems?

Lambert Hogenhout: The Canadian philosopher Wayne Gretzky famously said: “Don’t skate to where the [ice-hockey] puck is, but to where it is going to be.” While I recognize this is challenging in a landscape that shifts by the month, policymakers can focus on building adaptive governance frameworks—regulations that are principles-based rather than prescriptive, so they remain relevant as the technology evolves. They can invest in technical expertise within government so they are not entirely dependent on industry to explain what is happening. And they can establish international coordination mechanisms now, before the technology outpaces our ability to govern it collectively.

Similarly, leaders of organizations can invest in building organizational resilience and adaptability. This means developing AI governance structures that can evolve, training their workforce not just for today’s tools but for the capacity to learn continuously, and building strong ethical foundations that will guide decision-making regardless of what specific technologies emerge. The organizations that will navigate the next decade successfully are those that see responsible AI not as a compliance burden but as a core strategic capability.

Q12. What practical advice would you give to organizations trying to implement AI responsibly? What does the organizational structure, governance framework, and decision-making process of a truly responsible AI deployer look like?

Lambert Hogenhout: Start with clarity about your values and your risk appetite, not with the technology. The organizations that struggle most are those that adopt AI tools first and then try to retrofit governance and ethics around them. By that point, the technology has created its own momentum, and course correction becomes much harder. A truly responsible AI deployer has several characteristics: it has clear accountability (usually a senior leader or body with real authority); it embeds ethical review into the development and deployment lifecycle, (ethics by design); it invests in diverse teams, because the blind spots that lead to harmful AI outcomes are most often the result of homogeneous thinking; and in includes feedback loops (continuous monitoring).

…………………………………………………………………

Lambert Hogenhout is Chief Data and AI at the United Nations Secretariat.

He is also an author, keynote speaker and advisor on AI and responsible use of technology. He has 25 years of experience working both in the private sector and with international organizations such as the World Bank and the United Nations. He leads governance and strategy in the areas of data and AI and oversees its practical implementation. He has published on data privacy, data governance, the societal implications of technology and responsible use of AI.

……………………………..

Follow us on X

Follow us on LinkedIn

Mar 7 26

Technical Architecture Focus: Scaling Pandas to Petabytes: The Architecture and Tradeoffs of BigQuery DataFrames. Interview with Ivan Santa Maria Filho

by Roberto V. Zicari

Q1. You mentioned that BigFrames represents an interesting case study in “how a large company like Google can use OSS without really using OSS in the codebase.” Can you unpack this paradox?

Specifically:

BigFrames provides a pandas API, but the actual execution happens in BigQuery’s SQL engine via transpilation through intermediate representations (Ibis, SQLGlot). What are the fundamental architectural tradeoffs you face when creating an API-compatible layer versus actually forking and extending the original codebase?
From a legal/IP perspective, what considerations drive Google’s decision to reimplement APIs rather than wrap or extend existing OSS libraries? Is this purely about licensing, or are there technical benefits to the “clean room implementation” approach?
When you inevitably discover that certain pandas operations can’t be efficiently mapped to BigQuery SQL primitives, how do you decide between: (a) dropping that operation from your API surface, (b) implementing workarounds that might surprise users with different performance characteristics, or (c) extending BigQuery itself to support the operation natively?

Ivan Santa Maria Filho: Over the past 6 years I’ve been either leading or owning large data warehouse products. That includes Microsoft Cosmos Analytics and Azure Data Lake Analytics, and more recently leading a group in Google BigQuery called “BeyondSQL”. All three of those products are widely used by data scientists across the industry and represent more than 20 years of innovation. Cosmos Analytics and Azure Data Lake analytics have their own programming language, and BigQuery is SQL centered.

Both approaches have their merits and limitations. While a dedicated, proprietary language allowed us to innovate at Microsoft and build an amazing product, I believe that learning a proprietary programming language is not as interesting in 2026 as it was in 2008. People change jobs more often, and quite honestly Python seems to be the winner for data scientists. SQL, while widely used and familiar, does not have the best control flow and error handling semantics. BigQuery in general continues to advance SQL with extensions like BQML, but is also betting on Python and notebooks.

I believe Python won because it is fun to use, and quite honestly easier than a lot of other languages. It is growing in complexity, but I can see how a duck-typed, interpreted language would be more attractive to someone coming from an environment like Matlab, and leveraging a wide, awesome ecosystem of freely available libraries. My take is that the Python community did an exceptional job making it a very rich ecosystem, and got several large companies to contribute. I am looking forward to all performance improvements coming down their development pipeline.

Our strategy for features, just like the product itself, is to respect where our customers are. Data scientists like Python and notebooks, so they get Python and notebooks. Because data frames are a popular data abstraction, they get BigFrames.

We tried to keep the exact same semantics like, for example, implicit ordering. By default “head(5)” has “top(5)” semantics in BigFrames, which is a costly thing to do if the underlying data is a 1PB table without an index. If the user wants performance though, they can choose to relax the ordering semantics and have results faster and cheaper.

The architecture choice considerations were all technical. Our first implementation relied heavily on Ibis, and we love it, but we are now writing our own compiler layer. We want to make the BigFrames package smaller, and add BigQuery specific features without polluting Ibis with vendor specific details. We will continue to contribute to Ibis and in many cases they remain the right choice for developers.

BigFrames does not use any proprietary APIs, anyone could write something like it, but we work where we work, and we made specific choices that only make sense for BigQuery. For instance, we use the BigQuery store read/write streaming operations instead of running a “select *” query. We also implemented a client side smart cache that supports several predicate push-down techniques that are not general at all. We would love to see people extending BigFrames to other storage systems and data warehouses, but right now we are focused on BigQuery.

My team also developed support for managed Python functions in BigQuery. Those allow users to package almost anything from the Python ecosystem into a lambda / Cloud Run style function that can be “applied” to a data frame or series. For instance, the user can write a sophisticated image transformation function in sklearn, deploy it as a user defined function, and “.apply()” that function to a multimodal column in BigQuery. They can call Hugging Face from the user function too, or even host a lightweight model in Cloud Run. We take care of deployment, garbage collection, billing, and more, and they get to use anything from the OSS ecosystem when they wish.

As you point out, we found APIs that were hard to implement on top of BigQuery. We want to cover them all, but we prioritize by crawling public git projects and notebooks and sorting the functions by the most used, and by listening to our customers.

BigFrames has averaged two releases per month, and sometimes we go in directions we were not expecting because our customers asked for them, like implementing more visualization compatibility. We were expecting users to do data preparation for AI training, and data exploration was a bit of a surprise. BigFrames went from “not good” to “pretty good” in that space over last year.

Q2. BigFrames claims support for 150+ pandas functions, which is impressive but still a fraction of pandas’ full API surface. What are the hardest categories of pandas operations to support at BigQuery scale?

More specifically:

Stateful operations: Pandas allows arbitrary Python code with mutable state across operations. How do you handle operations that fundamentally assume in-memory, row-by-row iteration when your execution model is distributed SQL?
Ordering semantics: BigQuery DataFrames 2.0 introduced “partial ordering” mode as an optimization. Can you explain the exact semantic differences between pandas’ strict ordering guarantees and BigFrames’ partial ordering? Under what conditions does this difference become user-visible, and how do you help data scientists understand when they can safely relax ordering for performance?
Lazy evaluation boundaries: Pandas is eagerly evaluated; BigFrames builds a query plan. When a user calls df.head() or to_pandas(), you materialize results. How do you manage the impedance mismatch where users expect immediate feedback but you’re optimizing for deferred execution? Have you seen cases where this lazy evaluation confused users or led to unexpected costs?

Ivan Santa Maria Filho: We currently cover 850 of the approximately 1,400 Pandas functions, depending on whether you count all the supported parameter types or not.

Making ordering flexible is a very common design compromise for frameworks trying to make Pandas scale. For BigFrames we decided to let users choose the behavior they prefer. They can choose Pandas semantics with strict (consistent) ordering of rows, and calling an operator like “head()” multiple times will yield the same results every time, which requires the equivalent of an ORDER BY clause. This is expensive, and for complex indices, requires us to compute a column. If the user does not care about the ordering semantics, they can set a flag and BigFrames will avoid the ORDER BY operation. We also log warnings for all APIs that have implicit logging and, of course, allow the user to suppress the warning.

In some cases the user will be able to see a computed column with the complex index, which can cause compatibility issues. If the user explicitly names the columns they want, they see nothing. If they do not, they see any computed column we add.

The lazy evaluation is another interesting compromise. BigQuery runs on top of really big clusters, with tens of thousands of servers each. It is designed to run complex queries, and has an advanced optimizer. The reason we do lazy evaluation is because all Pandas APIs are transformed into an abstract syntax tree, and the actual operations are pending execution. A BigFrames data frame is a “promise” of a data frame – a name, and a pending log of operations. When we execute the operations, they are all combined by the optimizer. We might detect that a later filter would remove rows from an earlier operation and filter first.

Map-reduce systems have always dealt with choices like “should we sort the data then hash it for a join, or should we hash, join then shuffle sort?”. By using lazy execution we give ourselves a chance to use the optimizations and save the user money and time. Depending on how the user is paying for BigQuery, the amount of scanned data matters for cost and we are, again, 100% focused on customers. The first version of BigFrames we shipped was too expensive, and today we are on par with SQL.

When it comes to stateful operations, we support it in two ways. The data frames in BigFrames are more of a promise of a data frame than an actual data frame. When reading data from BigQuery the data frame contains a reference to a server side snapshot of the table. When writing to BigQuery the append operations are kept local until enough changes accumulate and we flush them to a temp table, or the user does an operation that triggers the flush. The data frame also contains a log of pending transformations. The user can call execute() on the data frame and BigFrames will apply the transformations locally if possible, or just fetch the results, which will cause a global optimization of pending transformations and a server call. The server call might be a direct storage operation (read/write) or a SQL job.

We also support Python UDFs, and those can retain state themselves. When the user performs an “apply(function)” operation, the function might be a remote function, which supports full web applications as backend, or a Python Managed function. The user can, for instance, create a remote function that connects to Hugging Face, download a transformer, cache it offline, and expose an API call to BigQuery. We will only initialize the web application when we launch it or add new instances of it, but every call to the UDF will benefit from the state of the server.

Q3. BigQuery’s UDF story has evolved from SQL/JavaScript UDFs that run in-process, to remote functions that call out to Cloud Functions, and now BigFrames 2.0 adds Python UDFs with a @udf decorator. Can you walk us through the architectural evolution and the limitations each approach addresses?

In particular:

Execution model tradeoffs: Running Python UDFs via Cloud Functions means network round-trips for every batch of rows. What’s the performance penalty in practice, and how do you amortize this cost through batching strategies? How large do result sets need to be before remote UDF overhead dominates total query time?
State management: Traditional UDFs can’t maintain state across invocations (by design, for parallelization). But data scientists often want to do things like “apply this pretrained ML model to every row” where loading the model once and reusing it would be far more efficient. How does BigFrames handle this? Can you cache model objects across UDF invocations, or does every batch reload from scratch?
Error handling and debugging: When a Python UDF crashes on row 4,782,391 of a 10-million-row table, how do data scientists debug this? What visibility do you provide into UDF execution, and how do you balance comprehensive logging with the cost/performance implications of collecting it at scale?
Security boundaries: Allowing arbitrary Python code to run is a massive security surface. How do you sandbox UDF execution to prevent: (a) accessing other customers’ data, (b) egress of sensitive data, (c) abuse of compute resources (crypto mining, etc.)?

Ivan Santa Maria Filho: I think it is important to say the UDFs are used by BigFrames, but users don’t need BigFrames to use them. They can declare and use them from SQL. We did not want to create a proprietary API for this, so we extended the public SQL API instead. This is a recurring theme for our team.

We expect the UDF space to evolve a lot in 2026 and 2027. BigQuery supports SQL UDFs, JavaScript UDFs, Remote Functions, and now Python managed UDFs. JS runs in a sandbox, which is itself inside a nested VM, running on the same set of machines as BigQuery workers. There is no network cost, but there are costs to launch the VM and inter process costs too. For remote and managed UDFs we currently run them on Cloud Run, and we have the network costs. What we do for those is to batch rows to amortize costs, and we have invested a significant amount of time to make the serialization and deserialization costs low.

This might sound counter-intuitive, but the biggest performance problem is not the network. The biggest challenge for us is to teach the optimizer how much individual UDFs take to process a row, and how many parallel calls we should be making, with how many rows on each call. For our first iteration we will ask users to help us by setting core counts, ram and concurrency level. We will give them telemetry and logging to let them make that call. Over time we want to watch the UDFs and adjust the settings automatically, but that will come later.

For your specific question, we support fairly complex UDFs. One of my first tests was to call Hugging Face from the UDF and set up a local pipeline (local to the UDF runtime, in Cloud Run). The UDF had two dozen Python functions defined, one to fetch my developer keys from our key service (KMS), another to take the key and download a text pipeline from hugging face, another to store the weights and setup a local cache, and so on. One of those Python functions was the UDF entry point.

When we instantiate the UDF, or auto-scale it by adding instances, we run the UDF body as if it was a main function in Python. I used that to setup the stateful model locally in the Cloud Run instance. When BigQuery calls the UDF, it calls the entry point function. You can find a similar example calling Google’s translation APIs – the client is instantiated only once.

We are considering a Python UDF version that runs in the shard like the JavaScript UDF, but it will depend on customer demand.

Error handling with data frames and Python is one of the advantages this approach has over SQL. If the user calls a function per data frame row, they can assign the return code to another data frame column. Then later use a filter to retry only the failed rows. SQL in general would force the user to retry the query again, which would run every row again. For example, let’s say you want to send emails to customers matching a given criteria using UDFs and SQL. Then assume that “SELECT send_email(customer_email) WHERE …” would select 10k users. If the send_email function fails for any of them, BigQuery would retry the entire job. The assumption of the SQL language is that send_mail() has no side effects until the entire job is successful, which is very likely not true. This is a very easy way to spam customers. Using Python and “apply()” the send_mail UDF can return a fail/pass return code, and a simple while loop can retry only the failed rows using a filter. This is also doable in SQL, but it is hard enough that makes for a good interview question.

Security is very important. Google enforces that all services and microservices have multiple security boundaries. For code running in the same machine as BigQuery processes, for example, user code runs on a sandbox, and the sandbox inside a gVisor VM. The gVisor VM has no IO stack, and very limited surface, and that is the public part of the solution. We have additional hardware, software, and network controls in place.

For managed Python you can safely assume we have at least the same mitigations in place, very robust monitoring, plus we deploy the code to Cloud Run, which sits on another cluster using a restricted configuration. For functions running in Cloud Run it is possible to access the Internet, but the user has to specify a connection configuration, which includes a service account, grant that service account the correct permissions, and make sure the VPC settings in their project allows it. If the project is configured to have internet access, the UDF creator has the right to create service accounts and connections, and permissions to access the internet, then it is possible to copy the data outside Google. By default there is no Internet access, so the user has to do work to enable it.

Q4. You mentioned BigFrames would “certainly explain the limitations of BigQuery.” Let’s dig into that. What are the most significant BigQuery architectural decisions that constrain what BigFrames can do, and how do these manifest as surprising limitations for users?

For example:

Storage format constraints: BigQuery’s columnar storage and partitioning strategy presumably makes some pandas operations prohibitively expensive. What operations fall into this category? Are there pandas patterns that work fine on 10GB but break completely at 10TB due to BigQuery’s architecture?
Type system mismatches: Pandas supports Python’s dynamic typing; BigQuery has a strict schema. How do you handle cases where a pandas operation would dynamically change column types based on data content? Do you fail at query planning time, or try to infer schemas and potentially fail at execution time?
Result size limits: BigQuery DataFrames 2.0 changed allow_large_results to default to False, failing queries that return >10GB compressed data. This is a dramatic departure from pandas’ “it fits in RAM or it doesn’t” model. How do you help users understand when they’re bumping against this limit, and what patterns do you recommend for working around it (beyond just “set the flag to True”)?
Transaction semantics: Pandas DataFrames are just objects; mutations are immediate and in-memory. BigFrames operations compile to queries. What happens when users expect ACID transaction semantics (e.g., “update these 3 tables atomically”) but you’re generating separate SQL statements?

Ivan Santa Maria Filho: BigQuery is designed to support SQL, to scale to datasets with PBs of data, and to use highly optimized, controlled SQL engine operators. For what it was designed it works exceptionally well. When it comes to running arbitrary user code, I believe we could do much more.

Many choices get harder at scale. The simplest one to describe is supporting the implicit ordering of rows. If you have 1GB of data, dropping an index and computing a new one will take a couple of seconds. If you have 10TB that will take longer, maybe not linearly longer, but longer. There is no magical way to fix this problem.

We could pull a page from RDBMS and use a B-Tree and clustering keys as storage, but BigQuery reads data from multiple partitions in parallel, and the data would return in random order. We could use a single partition for data frames storage, but that would limit scale and performance. It would also force a table rebuild when the index changes. We could use B-Trees and secondary indices to simulating a table scan. We could inject sort operators over a computed index column. Every option consumes time and raises the cost to our users.

We are offering the Pandas semantics by default, so users are not surprised, but also a mode more similar to what Polars and databases do. If our customers tell us this is acceptable, we would make it the default, otherwise continue to look for the best way to gain scale with the Pandas semantics.

The type mismatches are always a problem. Python uses duck-typing, but it also supports a very rich type system, with several Python libraries having their own data types, both simple and complex types. BigQuery is strongly typed, so we cannot just pass the bytes around, we have to convert from what is stored in the BigQuery cells to something that makes sense in Python. Those conversions can be expensive, particularly if the user is applying a UDF to a column or data frame. The data will be in BigQuery and passed to the UDF row wise or column wise depending on the call syntax, and the way that works is, BigQuery will partition the table holding the data frame data, and send each partition to a worker. This worker will read the data from our store and send it to the worker hosting the UDF. We do what we can to optimize this step, but that does not change the fact that the data in the store is in a different encoding than what Python expects. Even timestamps have different resolutions in BigQuery.

The result set size has a dual purpose. Certain operations have no inherent limits other than BigQuery limits. Applying a UDFs over rows will scale well, and because of it the user might even realize they are scanning hundreds of TBs of data. That can become really expensive, and the only billing surprise we like is when the price is lower than expected. The size limit is an attempt to avoid bad surprises.

The other purpose is to avoid crashing a notebook. If the user tries to render 10GB of data points in a notebook widget, odds are that will crash the notebook. One unique problem with very large datasets and series is that one cannot just plot every point. They also cannot just naively sample the data because they might miss a maximum, minimum, or anomalous data point. We are considering adding decimation algorithms to reduce the granularity of the series but retain its shape, maybe building that into BigFrames, but ideally contributing this to an OSS project.

As far as acid semantics go, BigFrames does not support complex transaction boundaries. There is no way to express that changes to two data frames should both be committed or not committed. That said, for a single data frame BigFrames uses “copy on mutate” approach, writing all changes to a new “backing table” then linking the client object to the resulting table if everything goes right. We could investigate a way to have cross-data frame transactions, but never got that requirement.

Q5. Looking forward, we’re seeing an explosion of “pandas-like” APIs: Dask, Modin, Polars, BigFrames, Snowpark Python, Databricks pandas API on Spark. Is the data science ecosystem converging toward pandas as a universal interface, or are we headed for fragmentation as each implementation adds vendor-specific extensions?

More philosophically:

API surface versioning: Pandas releases new versions regularly with API changes. How does BigFrames handle pandas version compatibility? Do you target a specific pandas version, or try to track the latest? What happens when pandas adds a feature you can’t support efficiently in BigQuery?
Beyond pandas: You mentioned that BigFrames 2.0 adds multimodal capabilities for unstructured data (images, text). Pandas wasn’t designed for this. At what point does extending the pandas API for new use cases become counterproductive, and you should just design a new API that’s purpose-built for distributed, multimodal data processing?
ML integration: BigFrames includes bigframes.ml with a scikit-learn-like API for BigQuery ML. But modern ML workflows involve PyTorch, TensorFlow, Hugging Face transformers, etc. How do you see the integration of these frameworks evolving? Will we see bigframes.torch or bigframes.transformers, or is there a fundamental mismatch between these frameworks’ execution models and BigQuery’s architecture?
Standards vs. ecosystems: Would the data science community benefit from a formal standard for “distributed dataframe APIs” (similar to how SQL standardized relational queries), or is the current Cambrian explosion of implementations actually healthy for innovation?

Ivan Santa Maria Filho: For API versioning, we follow the same model the OSS community does, with major and minor versions. We are expecting many large updates from Python and Pandas this year, and keeping up with the changes.

My take is that the ecosystem will continue to fragment for a while, and that is not necessarily bad. We have enough innovation in this space that both clients and backends are evolving and have diverse feature sets. It is quite hard to offer a smooth, common surface across backends, without compromising performance and / or cost. By the time any industry gets to be fully standardized, that is usually the time it is also commoditized, and investment slows.

The BigQuery team added support for multi-modal data, auto-generation of embeddings, and auto-quantization of models, making extraction and inferencing way cheaper. Most data in enterprises everywhere is not structured. The amount of data stored in documents, intranet pages, email, calendars, and collaboration / chat tools is way higher than data curated in tables.

I don’t see the point of hiding this functionality from customers, but I also don’t want to pollute the Pandas API namespace. We try to be as explicit as possible, so users know what is, and what is not a Pandas default API, but we make our extensions interoperable.

For example, it is fairly easy to perform sentiment analysis on a support phone call audio recording, then join the sentiment and user data in BigQuery so a CRM application can track how happy the customer was, and what were the issues they cared about.

It is getting increasingly easy to instruct an agent to watch the general sentiment around a product and only warn us when something changes.

The development around agents makes it harder to predict the future of Pandas-like frameworks. Given the current investment level, fragmentation is a natural evolution of this space, but if we achieve an agentic solution that produces results by answering questions in English, the mechanisms to handle data will be less popular.

The agents themselves will need a language to express what they want, but the number of direct active users might go down drastically. We might finally end up with something similar to the Star Trek Enterprise computer, and at that point I just don’t see a regular data scientist or business analyst writing Python directly.

…………………………………………………………………………

Additional Context for ODBMS.org Readers:

What is BigFrames? BigQuery DataFrames (BigFrames) is an open-source Python library that provides a pandas-compatible API for analyzing data stored in BigQuery. Unlike pandas, which loads data into local memory, BigFrames translates operations into BigQuery SQL, enabling data scientists to work with terabyte-scale datasets using familiar pandas syntax.

Why does this matter? Most data scientists learn pandas, but pandas doesn’t scale beyond single-machine memory limits. BigFrames (and competitors like Databricks pandas API, Snowpark Python) represent a new generation of tools that preserve familiar APIs while transparently distributing computation. Understanding the tradeoffs in these systems helps organizations choose the right tools and helps researchers understand the limits of API compatibility.

Key Technical Innovation: BigFrames uses a transpilation approach: pandas operations → Ibis intermediate representation → SQLGlot SQL generation → BigQuery execution. This allows Google to avoid directly bundling pandas code while maintaining API compatibility – a fascinating case study in software architecture and licensing strategy.

……………………………..

Follow us on X

Follow us on LinkedIn

Edit this

Feb 9 26

On AI and the Future of Rail Systems: Interview with Roland Edel

by Roberto V. Zicari

“AI reshapes rail jobs by reducing repetitive tasks and giving staff more responsibility for decision‑making. It also enables engineers and project teams to focus more on innovative and creative work, as well as to deliver complex rail projects on time and on budget. Technicians work increasingly data‑driven, dispatchers make better‑informed decisions, and drivers gradually move into supervisory roles for automated systems.”

Q1. As CTO of Siemens Mobility, you oversee one of the world’s most critical transportation infrastructure portfolios. When you look at the global rail industry today, where do you see AI and advanced algorithms creating the most transformative opportunities—not just for operational efficiency, but for fundamentally reimagining how rail systems serve cities and nations? What convinced you that AI was no longer optional but essential for the future of mobility?

Roland Edel: Data and Artificial Intelligence already make rail transport faster, more stable and more reliable—often without passengers even noticing. Today, AI detects early deviations in vehicles and infrastructure, analyses camera data and prevents disruptions before they materialize.

The next major step in the long run is Driverless Train Operations (DTO) with a Grade of Automation (GoA) 3 in mainline operations. In earlier projects such as BerDiBa and safe.trAIn, we developed foundational technologies that we are now applying in current projects like R2DATO and RemODtrAIn. Here, we are shaping the transition from semi‑automated operations (GoA2), including our ATO over ETCS project with S‑Bahn Hamburg, to fully automated operations (GoA4) or remote operations in stabling areas.

This requires close integration of onboard intelligence, sensors, digital infrastructure and signalling. These technologies lay the foundation for a system that can scale reliably even as demand grows.

For me, the turning point in our automation projects came when data on optimized train planning and energy savings made one thing unmistakably clear: analytics, algorithms and AI deliver tangible operational benefits—from more efficient planning to reduced energy consumption and more stable performance.

Q2. Many industries struggle to move AI initiatives from successful pilot programs to enterprise‑wide implementation. Rail systems are particularly complex—they involve safety‑critical operations, legacy infrastructure, multiple stakeholders, and regulatory frameworks that prioritize reliability above all else. What have been the biggest organizational and operational challenges you’ve encountered in scaling AI applications across Siemens Mobility’s rail portfolio, and how have you approached the tension between innovation and the rail industry’s paramount focus on safety?

Roland Edel: Scaling AI in the rail domain works only if we are able to incorporate safety‑critical functions into our innovations. Safety logic remains deterministic and certified; AI is added only where it is fully verifiable. Deployment follows a stepwise approach: first in depots, then in shunting areas, and later on the mainline.

Projects such as AutomatedTrain and others, in which we collaborate closely with an ecosystem of external partners, demonstrate how essential robust error detection and sensor fusion are for ensuring safe perception in open environments. At the same time, modern tools allow us to update safety‑relevant software during ongoing operations, keeping systems updated without compromising availability.

This combination—clear boundaries, strong diagnostics and incremental rollout—has proven to be the right way to balance innovation with the industry’s uncompromising safety culture. Finally, it all comes down to people: we can only scale AI when we train our employees accordingly and embed data and AI into all our processes.

Q3. AI is only as good as the data it learns from. Rail systems generate enormous amounts of operational data, but often in silos. From a leadership perspective, what does it take to build the data infrastructure that makes AI in rail reliable? How do you convince diverse stakeholders to share and standardize data?

Roland Edel: Trustworthy AI requires trustworthy data across the entire lifecycle of a rail system. That is why we increasingly rely on digital twins that connect design, engineering, manufacturing, operations and servicing. From the first CAD model to condition‑based maintenance and real‑time operations, a digital twin ensures that data remains consistent, interoperable and available wherever it is needed.

Open interfaces, standardized data models and federated platforms make this possible in practice. Our Railigent X suite plays a central role by integrating engineering data, vehicle data, infrastructure information and operational insights, while keeping operators in full control of their data.

When lifecycle data becomes interoperable, system availability improves, analytics become more precise, and the entire network operates more reliably and economically. And this is where stakeholders become convinced: when real projects demonstrate better services, higher reliability, improved cost structures and full data sovereignty. Once these benefits are visible, data collaboration stops being a hurdle and becomes an accelerator for innovation.

Q4. Predictive maintenance is often cited as AI’s ‘killer application.’ What is the realistic business case, and what has surprised you most about what it takes to make it work?

Roland Edel: Predictive maintenance delivers measurable business value: higher availability, reduced lifecycle costs and more efficient maintenance planning. AI uncovers patterns that humans cannot detect and enables precisely timed interventions.

What surprised me most was that cultural change often matters more than the algorithms themselves. Teams need to take into account the predictions, understand their implications and adapt work processes accordingly. Financially, the payoff is significant but requires patience—it is a long‑term investment.

The next step is what we call Predictive Availability, where entire functional chains—not just single components—remain stable. This includes linking data from incident reports, diagnostics, measurements, visual inspections and operational context into one lifecycle digital twin. This system understanding allows AI to anticipate disruptions earlier and more reliably.

The approach works well already, but its full potential depends on even closer collaboration across the ecosystem.

Q5. The rail industry is exploring different levels of automation. What framework do you use to decide what to automate first, and how do you balance safety, public trust and workforce concerns?

Roland Edel: We automate according to a clear framework: start where the environment is controlled and the benefits are greatest. Depots are ideal—they offer structured, repeatable processes with high potential for efficiency gains. Automation then moves to stabling and shunting yards, supported by AI‑driven obstacle detection and remote operation. From there, automation can be extended progressively.

At the same time, the human role remains central. Rare, complex edge cases are still best handled by experienced staff, so automation supports people rather than replaces them. Public trust grows when the benefits are transparent, greater safety, greater punctuality, fewer routine tasks, and when rollout is gradual. Each phase builds experience and confidence for the next.

Q6. Rail is already energy efficient. How big is AI’s role in sustainability, and how do you manage trade-offs?

Roland Edel: AI is one of the strongest levers for energy efficiency in rail transport. Automated driving profiles reduce energy consumption, maximize regenerative braking and minimize wear. AI‑based timetable optimization smooths traffic flows and prevents unnecessary stop‑and‑go patterns. To unlock these benefits across the entire network, data from vehicles, infrastructure and operations must be integrated. That is why we have introduced Siemens Xcelerator principles across our portfolio—Railigent X, Signaling X and the Mobility Software Suite X—to create modular cloud‑based software, interoperable APIs and an open ecosystem. Trade‑offs between energy efficiency and service frequency can be managed intelligently: AI enables the optimization of both simultaneously by balancing demand, capacity and operational constraints in real time.

Q7. AI and automation raise important questions about the future of work in rail. How do you approach workforce concerns, and what skills will be needed?

Roland Edel: AI reshapes rail jobs by reducing repetitive tasks and giving staff more responsibility for decision‑making. It also enables engineers and project teams to focus more on innovative and creative work, as well as to deliver complex rail projects on time and on budget. Technicians work increasingly data‑driven, dispatchers make better‑informed decisions, and drivers gradually move into supervisory roles for automated systems.

To support this shift, we invest in targeted training: digital learning platforms, simulation environments and hands‑on programs that build confidence in new tools. AI does not eliminate jobs; it modernizes them, creating more attractive, safer roles with clearer career perspectives.

Q8. Rail is heavily regulated. How do you work with regulators to build confidence in AI, and how do you earn public trust?

Roland Edel: Regulators are rightly accustomed to deterministic, fully explainable systems. We therefore involve them early—long before an AI‑based function enters the approval process. Together with our partner ecosystem, we develop methods to make AI systems traceable, testable and auditable, including virtual testbeds, robust perception validation and hybrid architectures that ensure safety‑critical logic remains reliable and predictable.

The overall system must remain predictable, and every AI‑supported decision must stay within defined boundaries. Continuous monitoring is essential: sensors and algorithms must detect when they deviate from expected performance and transition into safe states. Public trust grows through transparency, real‑world performance and a phased introduction—starting in controlled environments like depots and only later in passenger service.

Q9. Looking ahead to 2030, what does a realistic AI‑enabled rail system look like? And what challenges keep you up at night?

Roland Edel: By 2030, AI will be an almost invisible yet essential part of rail operations. Passengers will benefit from more reliable services, clearer information and smoother journeys. Data and AI will also enable highly personalized mobility services—from multimodal Mobility‑as‑a‑Service offerings to AI‑powered travel companions that proactively guide passengers throughout their journey.

Operators will rely on cloud‑based signaling, automated depots, predictive maintenance and digital supply chains. The system will become more resilient, flexible and climate‑friendly, and new applications will emerge. Three challenges remain. First, regulation and standards must evolve quickly enough to keep pace with innovation while maintaining safety. Second, the industry needs broader data and architecture harmonization across operators, suppliers and infrastructure owners. Third, workforce transformation must accelerate to align skills with new technologies.

To shape the Data & AI transformation in rail, we must open our data and platforms, modularize software, build digital twins and trustworthy industrial AI, strengthen ecosystem partnerships and accelerate deployment with confidence and purpose.

………………………………………………………………………………………………………

Roland Edel has been Chief Technology Officer and Head of Technology & Innovation at Siemens AG’s Mobility & Logistics Division in Munich since 2011. Since October 2014 the Division is conducted under the name Mobility.

After joining Siemens AG in Erlangen in 1993 as a design and development engineer at Transportation Systems, Roland Edel went on to assume various managerial roles within the former Electrification Division between 1996 and 2003. From 2003 onwards he was responsible for Engineering, Development and Product Management within the Business Unit Rail Electrification for five years. Roland Edel subsequently took charge of engineering and development within the newly formed Business Unit Turnkey, Electrification and Transrapid in Erlangen, before moving on to assume the position of Chief Technology Officer and Head of Innovative Mobility Solutions in the Business Unit Complete Transportation in 2009.

Resources:

– Digital Transformation for Rail, Siemens Mobility.

……………………………..

Follow us on X

Follow us on LinkedIn

Nov 25 25

Twenty Years of Conversations: Reflections on Technology and Society

by Roberto V. Zicari

By Roberto V. Zicari, Editor, ODBMS.org

“Because ultimately, what these twenty years of dialogue have taught me is that technology is never just about the technology. It’s about us, and the world we choose to build together.”

When I launched ODBMS.org in 2005, the technology landscape looked remarkably different. Object databases were the conversation. SQL versus NoSQL was a heated debate. The cloud was still a meteorological term for most developers. Twenty years and hundreds of interviews later, what strikes me most isn’t just how much technology has changed, but how profoundly it has reshaped the questions we ask.

In those early years, our conversations centered on technical elegance—data models, query optimization, transactional consistency. We debated whether object-relational mapping would bridge two worlds or create new complexities. These were important questions, but they were questions about technology itself.

Today’s conversations reveal a different world. When I interview leaders now, we discuss trust frameworks for AI in clinical care, the societal implications of real-time data streams that move billions of dollars in milliseconds, the responsibility that comes with systems that make life-or-death healthcare decisions. The technology hasn’t just gotten faster or more powerful—it has become deeply embedded in the fabric of human decision-making.

This evolution reflects something fundamental: we’ve moved from asking “Can we build this?” to asking “Should we build this?” and “What happens when we do?” The practitioners I’ve spoken with over two decades—from Vinton Cerf discussing internet governance to recent conversations about AI ethics and trustworthy systems—increasingly grapple with questions that transcend engineering.

The patterns that emerge from twenty years of dialogue are striking. First, the acceleration is real and relentless. A database professional from 2004 measuring latency in hundreds of milliseconds would be stunned by today’s nanosecond-level systems. But speed alone tells an incomplete story. What matters more is the expanding scope of impact. Systems that once managed business transactions now influence medical treatments, shape financial markets, and mediate human knowledge.

Second, every technological breakthrough creates new responsibilities. The Big Data revolution promised insights; it delivered privacy challenges. Cloud computing promised accessibility; it raised questions about data sovereignty. Generative AI promises creativity; it demands frameworks for attribution, bias, and trust. Each wave of innovation brings not just solutions but new ethical territories to navigate.

Third, the gap between possibility and wisdom persists. We can build systems of remarkable sophistication, yet we struggle with governance, interpretability, and equitable access. The technical challenges we once obsessed over—scalability, performance, reliability—now seem almost quaint compared to the societal challenges of ensuring technology serves humanity rather than destabilizing it.

Perhaps most significantly, I’ve watched the democratization of technology amplify both its potential and its risks. Open source movements have accelerated innovation beyond what any single corporation could achieve. Yet this same openness means that powerful capabilities spread faster than our collective wisdom about their use.

Looking back through twenty years of expert articles and interviews, I see an arc from technical optimism to responsible pragmatism. The pioneers I spoke with in 2005 were building the future with enthusiasm and relatively few constraints. Today’s innovators build with one eye on capability and another on consequence. They think not just about systems that work, but about systems that work for society.

The database and data management community has always been at the intersection of possibility and reality. We store, structure, and serve the information that powers decisions. Now, as that information flows through AI systems and influences outcomes at unprecedented scale, our responsibility extends beyond technical excellence to social awareness.

As ODBMS.org enters its third decade, we are more committed than ever to addressing these pressing issues head-on. The portal has evolved to tackle the urgent questions emerging from the generative AI era—questions about trustworthy AI systems, responsible deployment, bias and fairness, data provenance, and the governance frameworks needed for AI in critical domains like healthcare and finance. Our conversations now explore not just how these systems work, but how we ensure they work ethically and equitably.

The core mission remains: to create a space where practitioners, researchers, and leaders can share not just their technical insights, but their wisdom about building technology that serves human flourishing. In this new era of generative AI, that mission has never been more vital. Because ultimately, what these twenty years of dialogue have taught me is that technology is never just about the technology. It’s about us, and the world we choose to build together.

Nov 10 25

Community Over Code: Ruth Suehle on Leading The Apache Software Foundation into the Future

by Roberto V. Zicari

“Open communication, consensus, and collaboration are the heart of The Apache Way and always have been. That’s why you hear us say “community over code.”

Foundation Mission & Leadership

Q1. As President of The Apache Software Foundation, you’re leading one of the world’s most influential open-source organizations at a particularly dynamic moment in technology history. Can you share your vision for ASF’s mission today and how it has evolved? What does “The Apache Way”—the foundation’s collaborative, consensus-driven approach to software development—mean in 2025, and why do you believe this methodology remains vital as the software landscape becomes increasingly complex and commercially driven?

Ruth Suehle: The ASF has been around for more than 25 years, which has given us a lot of time with developing software collaboratively, and plenty of lessons learned along the way. The Apache Way is the name for our time-tested approach to open source development, but it’s not a set of policies or demands. We have hundreds of projects, each with their own culture, activities, and stage of development. As a whole, however, the ASF’s long-held belief is that open source software thrives best when it remains independent of any single or dominant commercial interests. The Apache Way gives all of those diverse projects a framework for maintaining neutrality and independence. This ensures that our projects serve the broader community.

It’s built around a few concepts, the first of which leads the rest, and that is earned authority. The ASF is built on a web of trust and publicly earned merit, which does not expire. The community is entirely volunteer-based (though of course many are paid by companies to work on projects housed at The ASF, as they are for any code-producing foundation), and votes are all equal.

Open communication, consensus, and collaboration are the heart of The Apache Way and always have been. That’s why you hear us say “community over code.” A strong and healthy community comes first, because a good community can fix bad code, but good code can’t heal a struggling community.

Q2. The Apache Software Foundation oversees hundreds of projects spanning everything from web servers to big data platforms to AI/ML frameworks. Looking across this diverse portfolio, what are the common threads or emerging patterns you’re seeing? Are there specific technical domains or project types where you’re seeing the most energy, innovation, or community growth? And conversely, are there areas where ASF projects face particular sustainability or relevance challenges?

Ruth Suehle: We actually map projects by category at projects.apache.org, so anyone is welcome to take a look and see where things lie today. What you mostly won’t see reflected there, however, are our projects in the Incubator, which is how new projects come into the foundation. The newest things there at any given time are likely to be reflections of broader trends in technology, and right now the latest additions are largely data-related.

It’s worth noting the other end of the lifecycle, as well: the Apache Attic. This is how we officially retire and archive projects, and it’s an important feature for the foundation and how we support a full project lifecycle. By ensuring transparency and providing a formal process for projects that are no longer under active development,the Attic acts as a historical archive, moving projects to a read-only state to preserve their code and documentation for users, while ceasing new development and providing limited oversight to allow for future maintenance if needed.

As for sustainability, I see this not as an ASF challenge or that of a particular project, but as a difficulty facing the entire open source ecosystem right now. I’ve given talks and led panels at a few events in the last year on the subject. It was a significant topic at this year’s Open Source Congress. When you say “sustainability,” people tend to hear “funding,” and that is an important factor, but it’s more complicated than just money. That said, complying with coming regulatory changes, notably the Cyber Resilience Act (CRA), is going to impose significant additional costs on open source projects and foundations. This year we launched our Tooling Initiative to address those concerns, and it’s the first of our ASF Initiatives, which offer targeted sponsorships for specific needs.

Current Projects & Strategic Directions

Q3. Apache has been foundational to the big data revolution with projects like Hadoop, Spark, Kafka, and Flink. As we move into the GenAI era, how are these established projects evolving to serve new workloads and use cases? Are you seeing Apache projects positioning themselves as critical infrastructure for AI applications—for instance, in data pipelines feeding LLMs, vector databases, or real-time inference systems? What role do you envision Apache projects playing in the broader AI infrastructure stack?

Ruth Suehle: Apache projects are not just evolving for the GenAI era—they are actively positioning themselves as critical infrastructure for AI applications, particularly in the domain of data pipelines, real-time context, and orchestration. The shift is from “batch big data” to “real-time, contextualized data streams” that feed LLMs and power real-time inference.

As you state, existing ASF projects are already well-positioned to plug right into the AI ecosystem. Apache Kafka can act as a mission-critical data fabric for generative AI applications, while Apache Flink’s focus on stateful, low-latency, and event-time stream processing is ideal for AI workflows. Apache Spark, Apache Airflow, and Apache Beam all fit well as tools to manage tasks like large-scale data preparation, workflow orchestration, and data abstraction. Two years ago, Apache Pinot added support for real-time vector ingestion in 2023 to enable similarity search as a real-time operation, addressing the need for immediate updates in generative AI pipelines. So Apache projects are not just migrating their existing functionality; they are fundamentally being adapted to own the data layer within AI infrastructure stacks.

Q4. Beyond the well-known flagship projects, what are some emerging or underappreciated Apache projects that you’re particularly excited about? Are there incubating projects or recent graduates from the Apache Incubator that you believe represent important directions for the foundation? What makes these projects significant, and what do they tell us about where the Apache community sees future opportunities?

Ruth Suehle: I can’t even pick favorite songs and movies, much less favorite projects! But seriously, this question is more like picking which of your children you think is the most promising. A huge part of our underlying ethos and governance at the ASF is supporting all projects equally and encouraging all of our projects to be as successful as possible. Their independence and unique communities, coupled with the incredible innovation we tend to see across all open source projects, means that any of our Incubator projects have the potential to bring significant innovation and advancement in their areas.

Q5. As President, what specific directions would you personally like to move The Apache Software Foundation forward? Are there strategic initiatives—whether technical, organizational, or community-focused—that you’re championing? This could range from attracting new types of projects, expanding global community participation, improving project sustainability models, or addressing gaps in the open-source ecosystem that ASF is uniquely positioned to fill.

Ruth Suehle: I mentioned earlier that when people hear “sustainability,” they often hear “money,” but it means other things as well. Fundamentally, sustainability is “what do we need to do to ensure the success of the open source ecosystem for decades to come?” One of the biggest changes I’ve seen in the last two or three years is a highly beneficial one, and that is a move towards more collaboration across the foundations, industry, and project communities. These groups have spent many years working largely as silos, which was fine when the work was all about individual software projects, but we’re facing more and more issues that are best solved by doing the thing that we all know best–collaboration. For The ASF, participating in groups like the Eclipse Foundation’s Open Regulatory Compliance Working Group, in our role as Open Source Initiative Affiliate members, and through partnerships like we have with Alpha-Omega help us reach solutions to common problems the open source way instead of constantly reinventing the wheel. Earlier this year, I was elected to the OSI board to represent the OSI’s Affiliate members, and I think the OSI’s work to bring together organizations through the Affiliate program and things like the Open Policy Alliance are great examples of this kind of cooperation that is not only the way forward for the entire ecosystem, but critical to continued success.

Another important piece of change we need for sustainability is doing a better job of growing a talent pipeline in open source. “Open source” got a lot of mainstream press for about 3 years after the term was coined in 1998, and then we all rather quietly built this massive ecosystem, again largely in silos. In 2025, that code is quite literally running the world, and there’s a lot more of it than there used to be. There are larger needs around it than there used to be. But the pool of maintainers has not grown at the same rate, and one place I think we really failed in all of open source is making sure we were bringing in new talent to keep up with the pace that we were creating at. We have plenty of room for improvement in preparing the next generation, and we have to keep building our people.

Simply put, we have a mentorship problem. I believe a large reason for that is that those who built open source software in the early years were doing exactly that–building from scratch. They may have had mentors in writing code, but they didn’t have mentors in open source, because they were writing the playbook as they went. As a result, they also didn’t have mentors in mentoring, i.e., a model to look to when mentoring the next generation of open source contributors.There are still a lot of folks around who have been here since roughly 1998, when the term “open source” was coined, or shortly thereafter. I don’t like the math, but the fact is that those people are retiring (or at least might like to one day!), and when I look around the room at events and on mailing lists, I’m not seeing enough new faces to keep up.

Future Vision & Community

Q6. Looking ahead three to five years, what does success look like for The Apache Software Foundation under your leadership? How do you want the foundation to be positioned relative to the major technological shifts we’re experiencing—not just GenAI, but also cloud-native architectures, edge computing, quantum computing, and emerging regulatory frameworks around software supply chain security and AI governance? What legacy or impact do you hope to achieve during your time as President, and what would you say to technologists, organizations, or students who are considering getting involved with Apache projects or The Apache Way of building software?

Ruth Suehle: There are a few important things coming in the next few years, and none of them are about specific technologies. New technologies are exciting, of course, but part of the reason they’re exciting is because they come and go. So the best thing we can do as a foundation is provide a solid structure for any project to build a community and a healthy open source project. We also need to keep making the technical improvements that will help them and their users, like the work we’re doing to build a foundation-wide release process and tooling infrastructure that enable ASF projects and incoming Incubator projects to fully comply with not only with the CRA, but all of the new regulations developing around the world.

If it’s not already obvious, the best thing I think The ASF can do, and the best way I can help is president, is to set an example for how to build good communities, both within our own foundation and in our collaboration with others. And the best thing that anyone who cares about the future of open source can do right this minute is not writing more code (which we’ll keep doing anyway), but to go find another person and turn them into a contributor, keeping in mind that the ecosystem is now vast and needs a lot more variety of skills than just writing code. For my part, I am always happy to share what I know, because hoarding knowledge helps no one. I frequently end talks by telling people if there’s anything I know that can help you, whether that’s finding ways to contribute, learning about how to bring your project into The ASF, starting an OSPO, or even making stellar baked goods, please reach out, and that goes for any reader here. Community over code thrives with each one of us building a little more community (and baked goods certainly never hurt!).

……………………………………………..

Ruth Suehle is the director of the open source program office at SAS, an analytics, data management, and AI software company. She is also president of the Apache Software Foundation and a member of the Open Source Initiative (OSI) board of directors. Ruth has helped build open source communities for nearly two decades, much of which she spent in the Open Source Program Office at Red Hat. Co-author of Raspberry Pi Hacks (O’Reilly, December 2013) and previously editor of Red Hat Magazine and opensource.com, Ruth is a writer and core contributor at GeekMom.com.

……………………………..

Follow us on X

Follow us on LinkedIn

Nov 4 25

On Database Query Performance in HeatWave and MySQL. Interview with Kaan Kara

by Roberto V. Zicari

“ Of course, in practice, no query optimizer is perfect and there will be edge cases where the way a query is written will impact its performance.”

Q1. What are your current responsibilities as Principal Member of Technical staff?

Kaan Kara : I am contributing as the tech lead for query execution in HeatWave. My main responsibility is implementing new features in HeatWave, maintaining its stability, and supporting our customers with their HeatWave-related use cases.

Q2. Let´s talk about improving database query execution time. The way a query is written has a massive impact on its performance, and developers often face hurdles in structuring them optimally. What is your take on this?

Kaan Kara : SQL is a declarative language. That means, in ideal terms, the database optimizer should produce the best query plan possible to answer the query, no matter how it is written. So, there should not be a need to optimize queries at SQL level. This is what we strive for when designing optimizers. Of course, in practice, no query optimizer is perfect and there will be edge cases where the way a query is written will impact its performance. I believe there are two practical ways a database service can help address this: The first approach is providing insights into the query plan and its execution. Our goals is to offer detailed and understandable insights about the query plan to our customers, so that they can see where the bottlenecks are, for more info please click here and here.

Once they see the bottleneck, they can think about how the query can be rewritten or certain optimizer hints could help, and so on.Secondly, it is important that the database itself provides alternative execution schemes or user-guided optimization methods. For instance, we recently introduced materialized temporary tables in HeatWave. Once the user sees that a certain query subtree is taking a long time, they can decide to create a materialized view on it, substantially accelerating their queries.

Q3. Indexing is the most common and effective way to speed up queries, what are the major source of challenges developers face?

Kaan Kara : Indexes come with maintenance cost, and they are often used without proper analysis of the trade-offs between that cost and the performance benefit they provide. HeatWave, with its in-memory columnar data architecture, helps eliminate the need for most indexing in analytical workloads. However, there are certain use cases where indexes provide value. One example is vector embedding-based nearest neighbor search, where index-based lookup is needed to ensure low response times. After introducing native VECTOR type last year, ), we recently introduced VECTOR-based indexing in HeatWave, enabling our customers to run approximate nearest neighbor search queries up to 2 orders of magnitude faster. One interesting direction we took was that we did not want to sacrifice on the result fidelity. We are employing a novel method that utilizes the index only when we believe the results it produces will be accurate.

Q4. Sometimes, the problem isn’t the query itself but the foundation it’s built on. Can you share your experience with this?

Kaan Kara : That is a very good point. Schema design plays a critical role in performance optimization. In some use cases, we see queries with predicates based on complex string operations or regular expressions, which make the query much slower than if the same predicate were applied to numeric columns. But this ties back to ease of use and declarative nature of interacting with databases. Ideally, the user should not have to worry about these things and do the most convenient thing, and the database should take care of optimizations behind the scenes.

In HeatWave, we strive to achieve this goal guided by real-world use cases from our customers. For example, we often observe read-heavy workloads repeatedly running the same expensive query subtree. To address this, we are developing an automated result cache that can materialize this subtree result within HeatWave and use it later when it is needed. We believe this feature will significantly improve query performance in many scenarios.

Q5. In a real-world application, a query doesn’t run in isolation. The performance of MySQL is heavily dependent on its configuration. What are your recommendations here?

Kaan Kara : That is true. Thankfully, we have a set of features in our Autopilot suite, which eliminate much of the configuration guesswork. For instance, depending on user’s data and sample queries, Autopilot suggests the correct cluster size, data placement key, appropriate column encodings, and much more. But it is usually not a one and done approach with configuration. User’s data and queries change over time. So, it is also crucial to provide detailed insights into the system consistently, so that adjustments can be made.An example is the need for efficient compute up and down scaling. Some customers require more compute in their peak operating hours for faster queries. In HeatWave, we provide zero downtime compute elasticity (YouTube video), thanks to our partitioning-based data architecture to cater for that need.

Q6. Beyond query-level tuning, what are the most significant architectural challenges that impede query performance, such as handling I/O bottlenecks from large table scans, managing inefficient data access patterns caused by normalization choices, or addressing network latency in distributed database environments?

Kaan Kara : This is a great question and one of the core things that we deal with daily when optimizing the HeatWave query execution engine. For an efficient distributed analytics engine, optimizing for I/O bottlenecks (for HeatWave, this means primarily memory and network) is at the top of the priority list. HeatWave has many optimizations to reduce these bottlenecks. For instance, we utilize an efficient vectorized bloom-filter to reduce the amount of probe-side data that we need to shuffle around in our cluster when performing a distributed join.

Driven by our customer workloads, recently we have worked on a late-materialization feature. Our customers work with wide string columns frequently. To reduce frequent access to these, we perform a transformation in our logical plan: Any wide columns that are not needed are removed from leaf table scan nodes; instead, we project the primary keys. Later in the plan, we introduce additional joins utilizing these primary keys to gather the wide columns that the query needs to produce the result. This feature will improve performance for certain production queries which project many wide columns by a significant amount.

Q7. Specifically, as of MySQL 9.3.0, it is possible to create temporary tables that are stored in the MySQL HeatWave Cluster. What are these table used for?

Kaan Kara : Yes, our customers can now create temporary tables directly within HeatWave, as in-memory materialized tables. Previously, the only way to load a table into HeatWave was through loading an InnoDB table or loading an external table from object storage. But sometimes, users want to store the result of a query as a temporary materialization without going through the load path, which can be a bottleneck.

Q8. Are these tables similar to conventional database views?

Kaan Kara : They are very similar to materialized views, but temporary tables are static. So, changes in the base tables will not be propagated and temporary tables themselves cannot be changed. If the customer use case requires change propagation from base tables, then materialized views are the right approach, which will be supported soon in HeatWave.

Q9. Can you please explain how these MySQL HeatWave temporary table help reducing query execution time?

Kaan Kara : Let me give an example: Consider an analyst investigating the transactions on a certain publicly traded stock. The queries will need to perform a join between “stocks” and “transactions” tables on some stock-id, followed by further aggregations (getting volume by date) or maybe further joins and ordering (sorting by largest buyers in each timeframe) etc. In this example, the initial join between “stocks” and “transactions” needs to be performed repeatedly and can be an expensive part of the queries. The analyst can now create a materialized temporary table based on the result of this join directly within HeatWave and it can be used later as much as needed by other operations.

Q10. Is calculating the Load factor, i.e. measuring of how full a hash table is, really a good metric to calculate Query Execution Times? Or are there any metrics that need to be taken into consideration?

Kaan Kara : By itself, it is a narrow metric and only relevant to figure out a single join’s or an aggregation’s cost. During our physical compilation, this metric contributes to our cost estimation indirectly: Depending on a join’s build side cardinality or a group-by’s output cardinality, we choose an appropriate hash table size. This size then dictates the runtime and memory cost of each operation. To estimate the query cost holistically, all relational operators along with how much data will be moved around is then considered.

Q11. What is your next project you wish to work on?

Kaan Kara : My next projects are around automatic maintenance of materialized views within HeatWave. This entails automatic substitution and creation of materialized views. We are excited to share more soon.

………………………………………………………

Kaan Kara is a principal member of technical staff at Oracle, working as a lead developer mainly responsible for query execution in HeatWave MySQL.

As part of the HeatWave team, he has led multiple projects that substantially improved the performance and the memory efficiency of the query execution engine. A sample of the projects include pipelined relational operator execution, bloom-filter enhanced distributed joins, base relation compression, and late decompression optimizations. Collectively, these improvements led to factors of geomean reduction in analytical benchmarks, such as TPC-H and TPC-DS, while reducing the memory requirements of the in-memory execution engine, enabling a single HeatWave node with 512GB memory to run the 1TB TPC-H benchmark in full.

More recently, he was the lead developer introducing the new VECTOR type to MySQL, along with highly optimized vector processing functions within HeatWave, laying the data layer foundation that enabled highly anticipated vector store features within HeatWave, such as semantic search and retrieval-augmented generation.

Prior to joining Oracle, Kaan received his doctoral degree in 2020 from ETH Zurich, Systems Group in Computer Science Department. His research focused on using reconfigurable hardware devices (FPGAs) to accelerate data analytics. He has published papers in top database venues such as VLDB and SIGMOD, showcasing the potential benefit of FPGA-based implementations for data partitioning and in-database machine learning tasks.

Resources

On HeatWave MySQL: Query Execution, Performance, Benchmarks, and Vector type. Q&A with Kaan Kara. ODBMS.ORG MARCH 4, 2025

…………………………………….

Follow us on X

Follow us on LinkedIn

ODBMS Industry Watch

What I Didn’t Learn in Medical School: Mathias Goyen on AI, Judgment, and the Human Side of Healing

Trust Is Not a Feeling: Nuno Galante Valério on Engineering Accountability into AI for High-Stakes Healthcare

Twenty Years of Conversations: Reflections on Technology and Society

Community Over Code: Ruth Suehle on Leading The Apache Software Foundation into the Future

Foundation Mission & Leadership

Current Projects & Strategic Directions

Future Vision & Community

On Database Query Performance in HeatWave and MySQL. Interview with Kaan Kara

About the author

Archives

Meta

About

Flickr

Search

Q1. You mentioned that BigFrames represents an interesting case study in “how a large company like Google can use OSS without really using OSS in the codebase.” Can you unpack this paradox?

Q2. BigFrames claims support for 150+ pandas functions, which is impressive but still a fraction of pandas’ full API surface. What are the hardest categories of pandas operations to support at BigQuery scale?

Q3. BigQuery’s UDF story has evolved from SQL/JavaScript UDFs that run in-process, to remote functions that call out to Cloud Functions, and now BigFrames 2.0 adds Python UDFs with a @udf decorator. Can you walk us through the architectural evolution and the limitations each approach addresses?

Q4. You mentioned BigFrames would “certainly explain the limitations of BigQuery.” Let’s dig into that. What are the most significant BigQuery architectural decisions that constrain what BigFrames can do, and how do these manifest as surprising limitations for users?

Additional Context for ODBMS.org Readers:

Foundation Mission & Leadership

Current Projects & Strategic Directions

Future Vision & Community

About the author

Tags

Archives

Meta

About

Flickr

Search